中国邮电高校学报(英文) ›› 2020, Vol. 27 ›› Issue (5): 34-46.doi: 10.19682/j.cnki.1005-8885.2020.0020

• Artificial Intelligence • 上一篇    下一篇

Web log classification framework with data augmentation based on GANs

何明枢; 金磊; 王小娟; 李源   

  1. 北京邮电大学
  • 收稿日期:2019-11-12 修回日期:2020-05-10 出版日期:2020-10-22 发布日期:2020-10-23
  • 通讯作者: 王小娟 E-mail:wj2718@163.com

Web log classification framework with data augmentation based on GANs

He Mingshu, Jin Lei, Wang Xiaojuan, Li Yuan   

  1. Beijing University of Posts and Telecommunications
  • Received:2019-11-12 Revised:2020-05-10 Online:2020-10-22 Published:2020-10-23
  • Contact: Xiao-Juan WANG E-mail:wj2718@163.com
  • Supported by:
    the National Natural Science Fund of China

摘要: Attacks on web servers are part of the most serious threats in network security fields. Analyzing logs of web attacks is an effective approach for malicious behavior identification. Traditionally, machine learning models based on labeled data are popular identification methods. Some deep learning models are also recently introduced for analyzing logs based on web logs classification. However, it is limited to the amount of labeled data in model training. Web logs with labels which mark specific categories of data are difficult to obtain. Consequently, it is necessary to follow the problem about data generation with a focus on learning similar feature representations from the original data and improve the accuracy of classification model. In this paper, a novel framework is proposed, which differs in two important aspects: one is that long short-term memory (LSTM) is incorporated into generative adversarial networks (GANs) to generate the logs of web attack. The other is that a data augment model is proposed by adding logs of web attack generated by GANs to the original dataset and improved the performance of the classification model. The results experimentally demonstrate the effectiveness of the proposed method. It improved the classification accuracy from 89.04% to 95.04%.

关键词: generative adversarial networks(GANs), web log, data augmentation, classification

Abstract: Attacks on web servers are part of the most serious threats in network security fields. Analyzing logs of web attacks is an effective approach for malicious behavior identification. Traditionally, machine learning models based on labeled data are popular identification methods. Some deep learning models are also recently introduced for analyzing logs based on web logs classification. However, it is limited to the amount of labeled data in model training. Web logs with labels which mark specific categories of data are difficult to obtain. Consequently, it is necessary to follow the problem about data generation with a focus on learning similar feature representations from the original data and improve the accuracy of classification model. In this paper, a novel framework is proposed, which differs in two important aspects: one is that long short-term memory (LSTM) is incorporated into generative adversarial networks (GANs) to generate the logs of web attack. The other is that a data augment model is proposed by adding logs of web attack generated by GANs to the original dataset and improved the performance of the classification model. The results experimentally demonstrate the effectiveness of the proposed method. It improved the classification accuracy from 89.04% to 95.04%.

Key words: generative adversarial networks(GANs), web log, data augmentation, classification

中图分类号: