Method of data cleaning for network traffic classification

doi:10.1016/S1005-8885(14)60299-5

中国邮电高校学报(英文) ›› 2014, Vol. 21 ›› Issue (3): 35-45.doi: 10.1016/S1005-8885(14)60299-5

Method of data cleaning for network traffic classification

王若愚刘珍张凌

School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China

收稿日期:2014-01-06 修回日期:2014-04-02 出版日期:2014-06-30 发布日期:2014-06-30
通讯作者: 王若愚 E-mail:rywang@scut.edu.cn
基金资助:
Our work was supported by the National Basic Research Program of China (2009CB320505).

Method of data cleaning for network traffic classification

王若愚刘珍张凌

School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China

Received:2014-01-06 Revised:2014-04-02 Online:2014-06-30 Published:2014-06-30
Contact: ruoyu wang E-mail:rywang@scut.edu.cn
Supported by:
Our work was supported by the National Basic Research Program of China (2009CB320505).

摘要/Abstract

摘要：

Network traffic classification aims at identifying the application types of network packets. It is important for Internet service providers (ISPs) to manage bandwidth resources and ensure the quality of service for different network applications. However, most classification techniques using machine learning only focus on high flow accuracy and ignore byte accuracy. The classifier would obtain low classification performance for elephant flows as the imbalance between elephant flows and mice flows on Internet. The elephant flows, however, consume much more bandwidth than mice flows. When the classifier is deployed for traffic policing, the network management system cannot penalize elephant flows and avoid network congestion effectively. This article explores the factors related to low byte accuracy, and secondly, it presents a new traffic classification method to improve byte accuracy at the aid of data cleaning. Experiments are carried out on three groups of real-world traffic datasets, and the method is compared with existing work on the performance of improving byte accuracy. Experiment shows that byte accuracy increased by about 22.31% on average. The method outperforms the existing one in most cases.

关键词:

network traffic classification, byte accuracy, elephant flow, mice flow, machine learning

Abstract:

Key words:

network traffic classification, byte accuracy, elephant flow, mice flow, machine learning

中图分类号:

TP393

WANG Ruo-yu , LIU Zhen, ZHANG Ling. Method of data cleaning for network traffic classification[J]. Acta Metallurgica Sinica(English letters), 2014, 21(3): 35-45.

[1]	肖雅郑世慧孙斌. Trusted GPSR protocol without reputation faking in VANET[J]. Acta Metallurgica Sinica(English letters), 2015, 22(5): 22-31.
[2]	石文孝吴丹许银龙王继红. Routing metric of interference-aware link quality: an improved ETX in wireless mesh networks[J]. Acta Metallurgica Sinica(English letters), 2014, 21(5): 61-67.
[3]	肖丁石川. Autonomic discovery of subgoals in hierarchical reinforcement learning[J]. Acta Metallurgica Sinica(English letters), 2014, 21(5): 94-104.
[4]	童俊杰鄂海红宋美娜宋俊德. Host load prediction in cloud based on classification methods[J]. Acta Metallurgica Sinica(English letters), 2014, 21(4): 40-46.
[5]	徐嬴颖陈常嘉赵永祥陈一帅. Cloud download system optimizing by job and notification scheduling[J]. Acta Metallurgica Sinica(English letters), 2014, 21(4): 47-53.
[6]	林文辉雷振明刘军杨洁刘芳何刚,WANG Qin. MapReduce optimization algorithm based on machine learning in heterogeneous cloud environment[J]. Acta Metallurgica Sinica(English letters), 2013, 20(6): 77-87.
[7]	李文霁郑康锋张冬梅 YE-Qing YANG Yi-xian. Efficient identity-based signature scheme with batch authentication for delay tolerant mobile sensor network [J]. Acta Metallurgica Sinica(English letters), 2013, 20(4): 80-86.
[8]	付雄 ZHU Xin-xin, HAN Jing-yu, WANG Ru-chuan. QoS-aware replica placement for data intensive applications[J]. Acta Metallurgica Sinica(English letters), 2013, 20(3): 43-47.
[9]	姚玉坤王冠任智李鹏翔陈永超. Efficient distributed address assignment algorithm based on topology maintenance in ZigBee networks[J]. Acta Metallurgica Sinica(English letters), 2013, 20(3): 53-59.
[10]	张招亮李栋黄庭培崔莉. Leveraging data fusion to improve barrier coverage in wireless sensor networks[J]. Acta Metallurgica Sinica(English letters), 2013, 20(1): 26-36.
[11]	沙超王汝传. Energy-efficient node deployment strategy forwireless sensor networks[J]. Acta Metallurgica Sinica(English letters), 2013, 20(1): 54-57.
[12]	姚玉坤温亚迪任智刘智虎. High efficient multipacket decoding approach for network coding in wireless networks[J]. Acta Metallurgica Sinica(English letters), 2013, 20(1): 95-100.
[13]	詹振球朱杰孙宜进. Further results on local stability of LRC-RED algorithm in Internet[J]. Acta Metallurgica Sinica(English letters), 2012, 19(5): 99-103.
[14]	詹振球朱杰徐迪. Stability analysis in an AVQ model of Internet congestion control algorithm[J]. Acta Metallurgica Sinica(English letters), 2012, 19(4): 22-28.
[15]	吴吉祥夏靖波. Enhanced calculation of necessary QoS for user satisfaction with a QoS mapping matrix[J]. Acta Metallurgica Sinica(English letters), 2012, 19(4): 29-33.

Method of data cleaning for network traffic classification

Method of data cleaning for network traffic classification

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价