JOURNAL OF CHINA UNIVERSITIES OF POSTS AND TELECOM ›› 2018, Vol. 25 ›› Issue (6): 21-30.doi: 10.19682/j.cnki.1005-8885.2018.1024

• Artificial intelligence • Previous Articles     Next Articles

Autonomous driving in the uncertain traffic -- a deep reinforcement learning approach

Yang Shun, Wu Jian, Zhang Sumin, Han Wei   

  1. 1. State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China
    2. Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China
  • Received:2018-10-23 Revised:2018-12-27 Online:2018-12-30 Published:2019-02-26
  • Contact: Zhang Sumin, E-mail: suminzhang@163.com E-mail:suminzhang@163.com
  • About author:Zhang Sumin, E-mail: suminzhang@163.com
  • Supported by:
    Sample DRL training and demo sequences are provided as supplementary material for the review process. The URL are directly input below.
    Agent driving without traffic participants:https://youtu.be/dMMi3a_BaqU.
    Agent driving with traffic participants:https://youtu.be/gnSzw9c2TuM.

Abstract: Driving in the complex traffic safely and efficiently is a difficult task for autonomous vehicle because of the stochastic characteristics of engaged human drivers. Deep reinforcement learning (DRL), which combines the abstract representation capability of deep learning (DL) and the optimal decision making and control capability of reinforcement learning (RL), is a good approach to address this problem. Traffic environment is built up by combining intelligent driver model (IDM) and lane-change model as behavioral model for vehicles. To increase the stochastic of the established traffic environment, tricks such as defining a speed distribution with cutoff for traffic cars and using various politeness factors to represent distinguished lane-change style, are taken. For training an
artificial agent to achieve successful strategies that lead to the greatest long-term rewards and sophisticated maneuver, deep deterministic policy gradient (DDPG) algorithm is deployed for learning. Reward function is designed to get a trade-off between the vehicle speed, stability and driving safety. Results show that the proposed approach can achieve good autonomous maneuvering in a scenario of complex traffic behavior through interaction with the environment.

Key words: autonomous driving, complex traffic scenario, DRL, DDPG

CLC Number: