Web log classification framework with data augmentation based on GANs

doi:10.19682/j.cnki.1005-8885.2020.0020

中国邮电高校学报(英文) ›› 2020, Vol. 27 ›› Issue (5): 34-46.doi: 10.19682/j.cnki.1005-8885.2020.0020

• Artificial Intelligence • 上一篇下一篇

Web log classification framework with data augmentation based on GANs

何明枢; 金磊; 王小娟; 李源

北京邮电大学

收稿日期:2019-11-12 修回日期:2020-05-10 出版日期:2020-10-22 发布日期:2020-10-23
通讯作者: 王小娟 E-mail:wj2718@163.com

Web log classification framework with data augmentation based on GANs

He Mingshu, Jin Lei, Wang Xiaojuan, Li Yuan

Beijing University of Posts and Telecommunications

Received:2019-11-12 Revised:2020-05-10 Online:2020-10-22 Published:2020-10-23
Contact: Xiao-Juan WANG E-mail:wj2718@163.com
Supported by:
the National Natural Science Fund of China

摘要/Abstract

摘要： Attacks on web servers are part of the most serious threats in network security fields. Analyzing logs of web attacks is an effective approach for malicious behavior identification. Traditionally, machine learning models based on labeled data are popular identification methods. Some deep learning models are also recently introduced for analyzing logs based on web logs classification. However, it is limited to the amount of labeled data in model training. Web logs with labels which mark specific categories of data are difficult to obtain. Consequently, it is necessary to follow the problem about data generation with a focus on learning similar feature representations from the original data and improve the accuracy of classification model. In this paper, a novel framework is proposed, which differs in two important aspects: one is that long short-term memory (LSTM) is incorporated into generative adversarial networks (GANs) to generate the logs of web attack. The other is that a data augment model is proposed by adding logs of web attack generated by GANs to the original dataset and improved the performance of the classification model. The results experimentally demonstrate the effectiveness of the proposed method. It improved the classification accuracy from 89.04% to 95.04%.

关键词: generative adversarial networks(GANs), web log, data augmentation, classification

Abstract: Attacks on web servers are part of the most serious threats in network security fields. Analyzing logs of web attacks is an effective approach for malicious behavior identification. Traditionally, machine learning models based on labeled data are popular identification methods. Some deep learning models are also recently introduced for analyzing logs based on web logs classification. However, it is limited to the amount of labeled data in model training. Web logs with labels which mark specific categories of data are difficult to obtain. Consequently, it is necessary to follow the problem about data generation with a focus on learning similar feature representations from the original data and improve the accuracy of classification model. In this paper, a novel framework is proposed, which differs in two important aspects: one is that long short-term memory (LSTM) is incorporated into generative adversarial networks (GANs) to generate the logs of web attack. The other is that a data augment model is proposed by adding logs of web attack generated by GANs to the original dataset and improved the performance of the classification model. The results experimentally demonstrate the effectiveness of the proposed method. It improved the classification accuracy from 89.04% to 95.04%.

Key words: generative adversarial networks(GANs), web log, data augmentation, classification

中图分类号:

He Mingshu, Jin Lei, Wang Xiaojuan, Li Yuan. Web log classification framework with data augmentation based on GANs[J]. The Journal of China Universities of Posts and Telecommunications, 2020, 27(5): 34-46.

参考文献 32

1.	Lins F, Damasceno J, Silva B, et al. Towards an approach to design and enforce security in web service composition. International Journal of Web Engineering and Technology, 2012, 7(4): 323-357.
2.	Lin Q, Zhang H, Lou J G, et al. Log clustering based problem identification for online service systems. Proceedings of the 38th International Conference on Software Engineering (ICSE’16), 2016, May 14-22, Austin, TX, USA. Piscataway, NJ, USA: IEEE, 2016: 102-111.
3.	Bengio Y, Laufer E, Alain G, et al. Deep generative stochastic networks trainable by backprop. Proceedings of the 31th International Conference on Machine Learning (ICML’14), 2014, Jun 21-26, Beijing, China. New York, NY, USA: ACM, 2014: 226-234.
4.	Goodfellow I J, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets. Advances in Neural Information Processing Systems 27: Proceedings of the 27th Annual Conference on Neural Information Processing Systems (NIPS’14), 2014, Dec 8-13 2014, Montreal, Canada. Berlin, Germany: Springer , 2014: 2672-2680.
5.	Denton E L, Chintala S, Fergus R. Deep generative image models using a laplacian pyramid of adversarial networks. Advances in Neural Information Processing Systems 28: Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS’15), 2015, Dec 7-12, 2015, Montreal, Canada. Berlin, Germany: Springer, 2015: 1486-1494.
6.	Huszár F. How (not) to train your generative model: Scheduled sampling, likelihood, adversary?. Under Rewiew as a Conference Paper at ICLR 2016. arXiv preprint, arXiv:1511.05101, 2015.
7.	Doersch C, Gupta A, Efros A A. Unsupervised visual representation learning by context prediction. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15), 2015, Dec 7-13, Santiago, Chile. Piscataway, NJ, USA: IEEE, 2015: 1422-1430.
8.	Salakhutdinov R. Learning deep generative models. Annual Review of Statistics and Its Application, 2015, 2: 361-385.
9.	Zhang H, Sindagi V, Patel V M. Image de-raining using a conditional generative adversarial network. IEEE Transactions on Circuits and Systems for Video Technology, 2019, DOI: 10.1109/TCSVT.2019.2920407.
10	Im D J, Kim C D, Jiang H, et al. Generating images with recurrent adversarial networks. arXiv preprint, arXiv:1602.05110, 2016
11	Bachman P, Precup D. Data generation as sequential decision making. Advances in Neural Information Processing Systems 28: Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS’15), 2015, Dec 7-12, 2015, Montreal, Canada. Berlin, Germany: Springer, 2015: 3249-3257.
12	Pascual S, Bonafonte A, Serra J. SEGAN: Speech enhancement generative adversarial network. Proceedings of the 18th Annual Conference of the International Speech Communication Association (INTERSPEECH’17), 2017, Aug 20-24, Stockholm, Sweden. 2017: 3642-3646
13	Salimans T, Goodfellow I, Zaremba W, et al. Improved techniques for training gans. Advances in Neural Information Processing Systems 29: Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS’16), 2016, Dec 5-10, Barcelona, Spain. Berlin, Germany: Springer, 2016: 2234-2242.
14	Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 2014, 15(1): 1929-1958.
15	Bengio S, Vinyals O, Jaitly N, et al. Scheduled sampling for sequence prediction with recurrent neural networks. Advances in Neural Information Processing Systems 28: Proceedings of the 28th Annual Conference on Neural Information Processing Systems (NIPS’15), 2015, Dec 7-12, 2015, Montreal, Canada. Berlin, Germany: Springer, 2015: 1171-1179.
16	Eck D, Schmidhuber J. Finding temporal structure in music: Blues improvisation with LSTM recurrent networks. Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing (NNSP’02), 2002, Sept 4-6, Martigny, Switzerland. Piscataway, NJ, USA: IEEE, 2002: 747-756.
17	Karpathy A, Johnson J, Li F F. Visualizing and understanding recurrent networks. arXiv preprint, arXiv:1506.02078, 2015.
18	Dosovitskiy A, Brox T. Generating images with perceptual similarity metrics based on deep networks. Advances in Neural Information Processing Systems 29: Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS’16), 2016, Dec 5-10, Barcelona, Spain. Berlin, Germany: Springer, 2016: 658-666.
19	Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15), 2015, Jun 7-12, Boston, MA, USA. Piscataway, NJ, USA: IEEE, 2015: 3431-3440.
20	Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture.Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV’15), 2015, Dec 7-13, Santiago, Chile. Piscataway, NJ, USA: IEEE, 2015: 2650-2658.
21	Graves A. Supervised sequence labelling with recurrent neural networks. Ph D Thesis. Muich, Germany: Technische Universit at München Fakult?t für Informatik, 2008.
22	Kingma D P, Ba J. Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), 2015, May 7-9, 2015, San Diego, CA, USA. 2015:15p
23	Agarap A F M. Deep learning using rectified linear units (ReLU). arXiv preprint, arXiv:1803.08375, 2018.
24	Marreiros A C, Daunizeau J, Kiebel S J, et al. Population dynamics: Variance and the sigmoid activation function. Neuroimage, 2008, 42(1): 147-157.
25	Li Y J, Liu B. A normalized Levenshtein distance metric. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(6): 1091-1095.
26	Yu L, Zhang W, Wang J, et al. SeqGAN: Sequence generative adversarial nets with policy gradient. Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI’17), 2017, Feb 4-9, San Francisco, CA, USA. Menlo Park ,CA, USA: American Association for Artificial Intelligence (AAAI), 2017: 2852-2858
27	Brock A, Donahue J, Simonyan K. Large scale gan training for high fidelity natural image synthesis. Proceedings of the 7th International Conference on Learning Representations (ICLR’19), 2019, May 6-9, 2019, New Orleans, LA, USA. 2019: 35p
28	Miyato T, Kataoka T, Koyama M, et al. Spectral normalization for generative adversarial networks. Proceedings of the 6th International Conference on Learning Representations (ICLR’18), 2018, Apr 30-May 3, Vancouver, Canada. 2018: 26p
29	Reed S, Akata Z, Yan X C, et al. Generative adversarial text to image synthesis. Proceedings of the 33nd International Conference on Machine Learning (ICML’16), 2016, Jun 19-24, New York, NY, USA. New York, NY, USA: ACM, 2016: 1060-1069
30	Huang H, Yu P S, Wang C Y. An introduction to image synthesis with generative adversarial nets. arXiv preprint, arXiv:1803.04469, 2018.
31	Kusner M J, Hernández-Lobato J M. Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv preprint, arXiv:1611.04051, 2016.
32	Chen T Y, Guestrin C. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16), 2016, Aug 13-17, San Francisco, CA, USA. New York, NY, USA: ACM, 2016: 785-794.

Web log classification framework with data augmentation based on GANs

Web log classification framework with data augmentation based on GANs

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献 32

相关文章 0

编辑推荐

Metrics

本文评价