The Journal of China Universities of Posts and Telecommunications ›› 2020, Vol. 27 ›› Issue (5): 34-46.doi: 10.19682/j.cnki.1005-8885.2020.0020

Previous Articles     Next Articles

Web log classification framework with data augmentation based on GANs

He Mingshu, Jin Lei, Wang Xiaojuan, Li Yuan   

  1. Beijing University of Posts and Telecommunications
  • Received:2019-11-12 Revised:2020-05-10 Online:2020-10-22 Published:2020-10-23
  • Contact: Xiao-Juan WANG
  • Supported by:
    the National Natural Science Fund of China

Abstract: Attacks on web servers are part of the most serious threats in network security fields. Analyzing logs of web attacks is an effective approach for malicious behavior identification. Traditionally, machine learning models based on labeled data are popular identification methods. Some deep learning models are also recently introduced for analyzing logs based on web logs classification. However, it is limited to the amount of labeled data in model training. Web logs with labels which mark specific categories of data are difficult to obtain. Consequently, it is necessary to follow the problem about data generation with a focus on learning similar feature representations from the original data and improve the accuracy of classification model. In this paper, a novel framework is proposed, which differs in two important aspects: one is that long short-term memory (LSTM) is incorporated into generative adversarial networks (GANs) to generate the logs of web attack. The other is that a data augment model is proposed by adding logs of web attack generated by GANs to the original dataset and improved the performance of the classification model. The results experimentally demonstrate the effectiveness of the proposed method. It improved the classification accuracy from 89.04% to 95.04%.

Key words: generative adversarial networks(GANs), web log, data augmentation, classification

CLC Number: