中国邮电高校学报(英文) ›› 2023, Vol. 30 ›› Issue (2): 73-82.doi: 10.19682/j.cnki.1005-8885.2023.0002

• • 上一篇    下一篇

Encrypted traffic classification based on fusion of vision transformer and temporal features

王岚婷1,胡威2,刘建毅1,庞进2,高雅婷2,薛婧瑶1,张婕3   

  1. 1. 北京邮电大学
    2. 国家电网有限公司信息通信分公司
    3. 国网山东省电力公司信息通信公司
  • 收稿日期:2022-03-24 修回日期:2022-11-21 出版日期:2023-04-30 发布日期:2023-04-27
  • 通讯作者: 刘建毅 E-mail:liujy@bupt.edu.cn
  • 基金资助:
    网络安全设备协同防御与联动处置技术研究

Encrypted traffic classification based on fusion of vision transformer and temporal features

Wang Lanting, Hu Wei, Liu Jianyi Pang Jin, Gao Yating, Xue Jingyao, Zhang Jie   

  • Received:2022-03-24 Revised:2022-11-21 Online:2023-04-30 Published:2023-04-27

摘要:

Aiming at the problem that the current encrypted traffic classification methods only use the single network framework such as CNN, RNN, and SAE, and only construct a shallow network to extract features, which leads to the low accuracy of encrypted traffic classification, we proposed an encrypted traffic classification framework based on the fusion of Vision Transformer and temporal features. The framework use BoTNet to extract spatial features and BiLSTM to extract temporal features, then use After the two sub-networks are parallelized, the framework uses the feature fusion method of early fusion to perform feature fusion after the two sub-networks parallelized, and finally identify encrypted traffic through the fused features. The experimental results show that the method in this paper can enhance the performance of encrypted traffic classification by fusing multi-dimensional features. The accuracy rate of VPN and non-VPN binary classification is as high as 99.9%, and the accuracy rate of fine-grained encrypted traffic twelve-classification can also reach more than 99%.

关键词:

encrypted traffic classification| vision transformer| temporal feature

Abstract:

Aiming at the problem that the current encrypted traffic classification methods only use the single network framework such as convolutional neural network (CNN), recurrent neural network (RNN), and stacked autoencoder (SAE), and only construct a shallow network to extract features, which leads to the low accuracy of encrypted traffic classification, an encrypted traffic classification framework based on the fusion of vision transformer and temporal features was proposed. Bottleneck transformer network (BoTNet) was used to extract spatial features and bi-directional long short-term memory (BiLSTM) was used to extract temporal features. After the two sub-networks are parallelized, the feature fusion method of early fusion was used in the framework to perform feature fusion. Finally, the encrypted traffic was identified through the fused features. The experimental results show that the BiLSTM and BoTNet fusion transformer (BTFT) model can enhance the performance of encrypted traffic classification by fusing multi-dimensional features. The accuracy rate of a virtual private network (VPN) and non-VPN binary classification is 99.9%, and the accuracy rate of fine-grained encrypted traffic twelve-classification can also reach 97%.

Key words: encrypted traffic classification| vision transformer| temporal feature