中国邮电高校学报(英文) ›› 2021, Vol. 28 ›› Issue (4): 75-87.doi: 10.19682/j.cnki.1005-8885.2021.2007

• Others • 上一篇    下一篇

Cross-project software defect prediction based on multi-source data sets

Huang Junfu, Wang Yawen, Gong Yunzhan, Jin Dahai
  

  1. School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
  • 收稿日期:2019-10-18 修回日期:2021-06-11 接受日期:2021-07-29 出版日期:2021-08-31 发布日期:2021-10-11
  • 通讯作者: Corresponding author: Wang Yawen, E-mail: wangyawen@bupt.edu.cn E-mail:wangyawen@bupt.edu.cn

Cross-project software defect prediction based on multi-source data sets

Huang Junfu, Wang Yawen, Gong Yunzhan, Jin Dahai   

  1. School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2019-10-18 Revised:2021-06-11 Accepted:2021-07-29 Online:2021-08-31 Published:2021-10-11
  • Contact: Corresponding author: Wang Yawen, E-mail: wangyawen@bupt.edu.cn E-mail:wangyawen@bupt.edu.cn

摘要: Cross-project defect prediction (CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of the source project and the target project, which makes it difficult to construct an effective defect prediction model. In order to alleviate the problem of negative migration between the source project and the target project in CPDP, this paper proposes an integrated transfer adaptive boosting (TrAdaBoost) algorithm based on multi-source data sets (MSITrA). The algorithm uses an existing two-stage data filtering algorithm to obtain source project data related to the target project from multiple source items, and then uses the integrated TrAdaBoost algorithm proposed in the paper to build a CPDP model. The experimental results of Promise's 15 public data sets show that: 1) The cross-project software defect prediction model proposed in this paper has better performance in all tested CPDP methods; 2) In the within-project software defect prediction (WPDP) experiment, the proposed CPDP method has achieved the better experimental results than the tested WPDP method.

关键词: cross-project defect prediction, multi-source transfer adaptive boosting, ensemble learning

Abstract: Cross-project defect prediction (CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of the source project and the target project, which makes it difficult to construct an effective defect prediction model. In order to alleviate the problem of negative migration between the source project and the target project in CPDP, this paper proposes an integrated transfer adaptive boosting (TrAdaBoost) algorithm based on multi-source data sets (MSITrA). The algorithm uses an existing two-stage data filtering algorithm to obtain source project data related to the target project from multiple source items, and then uses the integrated TrAdaBoost algorithm proposed in the paper to build a CPDP model. The experimental results of Promise's 15 public data sets show that: 1) The cross-project software defect prediction model proposed in this paper has better performance in all tested CPDP methods; 2) In the within-project software defect prediction (WPDP) experiment, the proposed CPDP method has achieved the better experimental results than the tested WPDP method.

Key words: cross-project defect prediction, multi-source transfer adaptive boosting, ensemble learning

中图分类号: