The Journal of China Universities of Posts and Telecommunications ›› 2021, Vol. 28 ›› Issue (4): 75-87.doi: 10.19682/j.cnki.1005-8885.2021.2007

• Others • Previous Articles     Next Articles

Cross-project software defect prediction based on multi-source data sets

Huang Junfu, Wang Yawen, Gong Yunzhan, Jin Dahai   

  1. School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2019-10-18 Revised:2021-06-11 Accepted:2021-07-29 Online:2021-08-31 Published:2021-10-11
  • Contact: Corresponding author: Wang Yawen, E-mail: wangyawen@bupt.edu.cn E-mail:wangyawen@bupt.edu.cn

Abstract: Cross-project defect prediction (CPDP) uses one or more source projects to build a defect prediction model and applies the model to the target project. There is usually a big difference between the data distribution of the source project and the target project, which makes it difficult to construct an effective defect prediction model. In order to alleviate the problem of negative migration between the source project and the target project in CPDP, this paper proposes an integrated transfer adaptive boosting (TrAdaBoost) algorithm based on multi-source data sets (MSITrA). The algorithm uses an existing two-stage data filtering algorithm to obtain source project data related to the target project from multiple source items, and then uses the integrated TrAdaBoost algorithm proposed in the paper to build a CPDP model. The experimental results of Promise's 15 public data sets show that: 1) The cross-project software defect prediction model proposed in this paper has better performance in all tested CPDP methods; 2) In the within-project software defect prediction (WPDP) experiment, the proposed CPDP method has achieved the better experimental results than the tested WPDP method.

Key words: cross-project defect prediction, multi-source transfer adaptive boosting, ensemble learning

CLC Number: