中国邮电高校学报(英文) ›› 2014, Vol. 21 ›› Issue (3): 35-45.doi: 10.1016/S1005-8885(14)60299-5

• Networks • 上一篇    下一篇

Method of data cleaning for network traffic classification

王若愚 刘珍 张凌   

  1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
  • 收稿日期:2014-01-06 修回日期:2014-04-02 出版日期:2014-06-30 发布日期:2014-06-30
  • 通讯作者: 王若愚 E-mail:rywang@scut.edu.cn
  • 基金资助:

    Our work was supported by the National Basic Research Program of China (2009CB320505).

Method of data cleaning for network traffic classification

王若愚 刘珍 张凌   

  1. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
  • Received:2014-01-06 Revised:2014-04-02 Online:2014-06-30 Published:2014-06-30
  • Contact: ruoyu wang E-mail:rywang@scut.edu.cn
  • Supported by:

    Our work was supported by the National Basic Research Program of China (2009CB320505).

摘要:

Network traffic classification aims at identifying the application types of network packets. It is important for Internet service providers (ISPs) to manage bandwidth resources and ensure the quality of service for different network applications. However, most classification techniques using machine learning only focus on high flow accuracy and ignore byte accuracy. The classifier would obtain low classification performance for elephant flows as the imbalance between elephant flows and mice flows on Internet. The elephant flows, however, consume much more bandwidth than mice flows. When the classifier is deployed for traffic policing, the network management system cannot penalize elephant flows and avoid network congestion effectively. This article explores the factors related to low byte accuracy, and secondly, it presents a new traffic classification method to improve byte accuracy at the aid of data cleaning. Experiments are carried out on three groups of real-world traffic datasets, and the method is compared with existing work on the performance of improving byte accuracy. Experiment shows that byte accuracy increased by about 22.31% on average. The method outperforms the existing one in most cases.

关键词:

network traffic classification, byte accuracy, elephant flow, mice flow, machine learning

Abstract:

Network traffic classification aims at identifying the application types of network packets. It is important for Internet service providers (ISPs) to manage bandwidth resources and ensure the quality of service for different network applications. However, most classification techniques using machine learning only focus on high flow accuracy and ignore byte accuracy. The classifier would obtain low classification performance for elephant flows as the imbalance between elephant flows and mice flows on Internet. The elephant flows, however, consume much more bandwidth than mice flows. When the classifier is deployed for traffic policing, the network management system cannot penalize elephant flows and avoid network congestion effectively. This article explores the factors related to low byte accuracy, and secondly, it presents a new traffic classification method to improve byte accuracy at the aid of data cleaning. Experiments are carried out on three groups of real-world traffic datasets, and the method is compared with existing work on the performance of improving byte accuracy. Experiment shows that byte accuracy increased by about 22.31% on average. The method outperforms the existing one in most cases.

Key words:

network traffic classification, byte accuracy, elephant flow, mice flow, machine learning

中图分类号: