中国邮电高校学报(英文) ›› 2018, Vol. 25 ›› Issue (6): 65-73.doi: 10.19682/j.cnki.1005-8885.2018.1028

• Artificial Intelligence • 上一篇    下一篇

Comparison of three data mining methods in predicting 5-year survival of colorectal cancer patients

Luo Yan, Sun Yawei, Fu Qunchao, Xue Tengfei, Zhou Ping   

  1. School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
    Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing 100876, China
  • 收稿日期:2018-08-07 修回日期:2019-01-04 出版日期:2018-12-30 发布日期:2019-02-26
  • 通讯作者: Zhou Ping, E-mail: zhouping4946@163.com E-mail:zhouping4946@163.com
  • 作者简介:Zhou Ping, E-mail: zhouping4946@163.com
  • 基金资助:
    This work was supported by the National Key Research and Development Program of China (2017YFC1307705).

Comparison of three data mining methods in predicting 5-year survival of colorectal cancer patients

Luo Yan, Sun Yawei, Fu Qunchao, Xue Tengfei, Zhou Ping   

  1. School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
    Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing 100876, China
  • Received:2018-08-07 Revised:2019-01-04 Online:2018-12-30 Published:2019-02-26
  • Contact: Zhou Ping, E-mail: zhouping4946@163.com E-mail:zhouping4946@163.com
  • About author:Zhou Ping, E-mail: zhouping4946@163.com
  • Supported by:
    This work was supported by the National Key Research and Development Program of China (2017YFC1307705).

摘要: The prediction of colorectal cancer (CRC) survivability has always been a challenging research issue. Considering the importance of predicting CRC patients' survival rates, we compared the performance of three data mining methods: decision trees (DTs), artificial neural networks (ANNs) and support vector machines (SVMs), for predicting 5-year survival of CRC patients to assist clinicians in making treatment decisions. The CRC dataset used to build the prediction model comes from the surveillance, epidemiology, and end results (SEER) program. The 5-fold cross-validation and random forest algorithm were respectively utilized for measuring the model predictive accuracy and the importance of features. Experimental results show that the predictive accuracy of ANNs (0.73) and SVMs (0.75) were higher than that of DTs, and they also have the best result in the area under the receiver operating characteristic (ROC) curve (area under curve (AUC) =0.82). This result may indicate high predictive power of ANNs and SVMs for predicting 5-year survival of CRC patients.

关键词: data mining, 5-year survival, CRC, SEER

Abstract: The prediction of colorectal cancer (CRC) survivability has always been a challenging research issue. Considering the importance of predicting CRC patients' survival rates, we compared the performance of three data mining methods: decision trees (DTs), artificial neural networks (ANNs) and support vector machines (SVMs), for predicting 5-year survival of CRC patients to assist clinicians in making treatment decisions. The CRC dataset used to build the prediction model comes from the surveillance, epidemiology, and end results (SEER) program. The 5-fold cross-validation and random forest algorithm were respectively utilized for measuring the model predictive accuracy and the importance of features. Experimental results show that the predictive accuracy of ANNs (0.73) and SVMs (0.75) were higher than that of DTs, and they also have the best result in the area under the receiver operating characteristic (ROC) curve (area under curve (AUC) =0.82). This result may indicate high predictive power of ANNs and SVMs for predicting 5-year survival of CRC patients.

Key words: data mining, 5-year survival, CRC, SEER

中图分类号: