Acta Metallurgica Sinica(English letters) ›› 2013, Vol. 20 ›› Issue (5): 97-103.doi: 10.1016/S1005-8885(13)60096-5

• Computer Science • 上一篇    下一篇

Offline traffic analysis system based on Hadoop

乔媛媛1,雷振明1,袁仑1,郭敏杰2   

  1. 1. Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. Produce Ads, Amazon Joyo Co. Ltd, Beijing 100025, China
  • 收稿日期:2013-03-05 修回日期:2013-06-06 出版日期:2013-10-30 发布日期:2013-10-29
  • 通讯作者: 乔媛媛 E-mail:qyybupt@gmail.com
  • 基金资助:
    This work was supported by the Important National Science & Technology Specific Projects (2012ZX03002008), the National Natural Science Foundation of China (61072061) and The Fundamental Research Funds for the Central Universities (2012RC0121).

Offline traffic analysis system based on Hadoop

  1. 1. Beijing Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. Produce Ads, Amazon Joyo Co. Ltd, Beijing 100025, China
  • Received:2013-03-05 Revised:2013-06-06 Online:2013-10-30 Published:2013-10-29
  • Contact: Yuan-Yuan QIAO E-mail:qyybupt@gmail.com
  • Supported by:
    This work was supported by the Important National Science & Technology Specific Projects (2012ZX03002008), the National Natural Science Foundation of China (61072061) and The Fundamental Research Funds for the Central Universities (2012RC0121).

摘要: Offline network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amount of information on the Internet, the traditional stand-alone analysis tools face great challenges in storage capacity and computing efficiency, but which is the advantages for Hadoop cluster. In this paper, we designed an offline traffic analysis system based on Hadoop (OTASH), and proposed a MapReduce-based algorithm for TopN user statistics. In addition, we studied the computing performance and failure tolerance in OTASH. From the experiments we drew the conclusion that OTASH is suitable for handling large amounts of flow data, and are competent to calculate in the case of single node failure.

关键词: MapReduce, Hadoop, cloud computing, traffic analysis

Abstract: Offline network traffic analysis is very important for an in-depth study upon the understanding of network conditions and characteristics, such as user behavior and abnormal traffic. With the rapid growth of the amount of information on the Internet, the traditional stand-alone analysis tools face great challenges in storage capacity and computing efficiency, but which is the advantages for Hadoop cluster. In this paper, we designed an offline traffic analysis system based on Hadoop (OTASH), and proposed a MapReduce-based algorithm for TopN user statistics. In addition, we studied the computing performance and failure tolerance in OTASH. From the experiments we drew the conclusion that OTASH is suitable for handling large amounts of flow data, and are competent to calculate in the case of single node failure.

Key words: MapReduce, Hadoop, cloud computing, traffic analysis