中国邮电高校学报(英文版) ›› 2020, Vol. 27 ›› Issue (2): 82-90.doi: 10.19682/j.cnki.1005-8885.2020.1010

• Others • 上一篇    

Malware variants detection based on ensemble learning

Ma Yan, Du Donggao   

  1. 1. Network and Information Center, Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2. National Engineering Laboratory for Mobile Network Security, Beijing University of Post and Telecommunications, Beijing 100876, China
  • 收稿日期:2020-02-05 修回日期:2020-05-09 出版日期:2020-04-30 发布日期:2020-07-07
  • 通讯作者: Du Donggao, E-mail: dudonggao@126.com E-mail:dudonggao@126.com
  • 作者简介:Du Donggao, E-mail: dudonggao@126.com
  • 基金资助:

    This work was supported by National Natural Science Foundation of China (61601041), Fundamental Research Funds for the Central Universities (2018RC55), Beijing Talents Foundation (2017000020124G062).

Malware variants detection based on ensemble learning

Ma Yan, Du Donggao   

  1. 1. Network and Information Center, Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
    2. National Engineering Laboratory for Mobile Network Security, Beijing University of Post and Telecommunications, Beijing 100876, China
  • Received:2020-02-05 Revised:2020-05-09 Online:2020-04-30 Published:2020-07-07
  • Contact: Du Donggao, E-mail: dudonggao@126.com E-mail:dudonggao@126.com
  • About author:Du Donggao, E-mail: dudonggao@126.com
  • Supported by:
    This work was supported by National Natural Science Foundation of China (61601041), Fundamental Research Funds for the Central Universities (2018RC55), Beijing Talents Foundation (2017000020124G062).

摘要: Application programming interface (API) is a procedure call interface to operation system resource. API-based behavior features can capture the malicious behaviors of malware variants. However, existing malware detection approaches have a deal of complex operations on constructing and matching. Furthermore, graph matching is adopted in many approaches, which is a nondeterministic polynominal (NP)-complete problem because of computational complexity. To address these problems, a novel approach is proposed to detect malware variants. Firstly, the API of the malware are divided by their functions and parameters. Then, the classified behavior graph (CBG) is constructed from the API call sequences. Finally, the signature based on CBGs for each malware family is generated. Besides, the malware variants are classified by ensemble learning algorithm. Experiments on 1 220 malware samples show that the true positive rate (TPR) is up to 89.0% with the low false positive rate (FPR) 3.7% by ensemble learning.

关键词:

classified behavior, malware variant, ensemble learning

Abstract: Application programming interface (API) is a procedure call interface to operation system resource. API-based behavior features can capture the malicious behaviors of malware variants. However, existing malware detection approaches have a deal of complex operations on constructing and matching. Furthermore, graph matching is adopted in many approaches, which is a nondeterministic polynominal (NP)-complete problem because of computational complexity. To address these problems, a novel approach is proposed to detect malware variants. Firstly, the API of the malware are divided by their functions and parameters. Then, the classified behavior graph (CBG) is constructed from the API call sequences. Finally, the signature based on CBGs for each malware family is generated. Besides, the malware variants are classified by ensemble learning algorithm. Experiments on 1 220 malware samples show that the true positive rate (TPR) is up to 89.0% with the low false positive rate (FPR) 3.7% by ensemble learning.

Key words:

classified behavior, malware variant, ensemble learning