The Journal of China Universities of Posts and Telecommunications ›› 2024, Vol. 31 ›› Issue (3): 72-79.doi: 10.19682/j.cnki.1005-8885.2024.1002

Previous Articles     Next Articles

Used car price prediction based on XGBoost and retention rate

Shen Yutian, Chen Jian, Dai Min, Zhang Sirui, Xu Jing, Wang Qing   

  1. 1. College of Mechanical Engineering, Yangzhou University, Yangzhou 225127, China
    2. College of Economics and Management, Southeast University Chengxian College, Nanjing 210088, China
    3. Shandong Shuncheng Automobile Trade Co. , Ltd, Jinan 250000, China
  • Received:2022-06-12 Revised:2022-11-10 Online:2024-06-30 Published:2024-06-30
  • Contact: Jian CHEN E-mail:chenjian.tud@hotmail.com
  • Supported by:
    the Postgraduate Education Reform Project of Yangzhou University (JGLX2021_002).

Abstract: In order to improve the accuracy of used car price prediction, a machine learning prediction model based on the
retention rate is proposed in this paper. Firstly, a random forest algorithm is used to filter the variables in the data.
Seven main characteristic variables that affect used car prices, such as new car price, service time, mileage and so
on, are filtered out. Then, the linear regression classification method is introduced to classify the test data into high
and low retention rate data. After that, the extreme gradient boosting ( XGBoost) regression model is built for the
two datasets respectively. The prediction results show that the comprehensive evaluation index of the proposed
model is 0. 548, which is significantly improved compared to 0. 488 of the original XGBoost model. Finally,
compared with other representative machine learning algorithms, this model shows certain advantages in terms of
mean absolute percentage error (MAPE), 5% accuracy rate and comprehensive evaluation index. As a result, the
retention rate-based machine learning model established in this paper has significant advantages in terms of the
accuracy of used car price prediction.

Key words: random forest, data dimensionality reduction, extreme gradient boosting (XGBoost), retention rate, price prediction