中国邮电高校学报(英文) ›› 2024, Vol. 31 ›› Issue (3): 72-79.doi: 10.19682/j.cnki.1005-8885.2024.1002

• Artificial Intelligence • 上一篇    下一篇

Used car price prediction based on XGBoost and retention rate

沈雨田1,陈建1,戴敏1,张思瑞2,徐晶1,王青3   

  1. 1. College of Mechanical Engineering, Yangzhou University, Yangzhou 225127, China
    2. College of Economics and Management, Southeast University Chengxian College, Nanjing 210088, China
    3. Shandong Shuncheng Automobile Trade Co. , Ltd, Jinan 250000, China

  • 收稿日期:2022-06-12 修回日期:2022-11-10 出版日期:2024-06-30 发布日期:2024-06-30
  • 通讯作者: 陈建 E-mail:chenjian.tud@hotmail.com
  • 基金资助:
    the Postgraduate Education Reform Project of Yangzhou University (JGLX2021_002).

Used car price prediction based on XGBoost and retention rate

Shen Yutian, Chen Jian, Dai Min, Zhang Sirui, Xu Jing, Wang Qing   

  1. 1. College of Mechanical Engineering, Yangzhou University, Yangzhou 225127, China
    2. College of Economics and Management, Southeast University Chengxian College, Nanjing 210088, China
    3. Shandong Shuncheng Automobile Trade Co. , Ltd, Jinan 250000, China
  • Received:2022-06-12 Revised:2022-11-10 Online:2024-06-30 Published:2024-06-30
  • Contact: Jian CHEN E-mail:chenjian.tud@hotmail.com
  • Supported by:
    the Postgraduate Education Reform Project of Yangzhou University (JGLX2021_002).

摘要: In order to improve the accuracy of used car price prediction, a machine learning prediction model based on the
retention rate is proposed in this paper. Firstly, a random forest algorithm is used to filter the variables in the data.
Seven main characteristic variables that affect used car prices, such as new car price, service time, mileage and so
on, are filtered out. Then, the linear regression classification method is introduced to classify the test data into high
and low retention rate data. After that, the extreme gradient boosting ( XGBoost) regression model is built for the
two datasets respectively. The prediction results show that the comprehensive evaluation index of the proposed
model is 0. 548, which is significantly improved compared to 0. 488 of the original XGBoost model. Finally,
compared with other representative machine learning algorithms, this model shows certain advantages in terms of
mean absolute percentage error (MAPE), 5% accuracy rate and comprehensive evaluation index. As a result, the
retention rate-based machine learning model established in this paper has significant advantages in terms of the
accuracy of used car price prediction.

关键词: random forest, data dimensionality reduction, extreme gradient boosting (XGBoost), retention rate, price prediction

Abstract: In order to improve the accuracy of used car price prediction, a machine learning prediction model based on the
retention rate is proposed in this paper. Firstly, a random forest algorithm is used to filter the variables in the data.
Seven main characteristic variables that affect used car prices, such as new car price, service time, mileage and so
on, are filtered out. Then, the linear regression classification method is introduced to classify the test data into high
and low retention rate data. After that, the extreme gradient boosting ( XGBoost) regression model is built for the
two datasets respectively. The prediction results show that the comprehensive evaluation index of the proposed
model is 0. 548, which is significantly improved compared to 0. 488 of the original XGBoost model. Finally,
compared with other representative machine learning algorithms, this model shows certain advantages in terms of
mean absolute percentage error (MAPE), 5% accuracy rate and comprehensive evaluation index. As a result, the
retention rate-based machine learning model established in this paper has significant advantages in terms of the
accuracy of used car price prediction.

Key words: random forest, data dimensionality reduction, extreme gradient boosting (XGBoost), retention rate, price prediction