中国邮电高校学报(英文版) ›› 2018, Vol. 25 ›› Issue (6): 21-30.doi: 10.19682/j.cnki.1005-8885.2018.1024

• Artificial Intelligence • 上一篇    下一篇

Autonomous driving in the uncertain traffic -- a deep reinforcement learning approach

Yang Shun, Wu Jian, Zhang Sumin, Han Wei   

  1. 1. State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China
    2. Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China
  • 收稿日期:2018-10-23 修回日期:2018-12-27 出版日期:2018-12-30 发布日期:2019-02-26
  • 通讯作者: Zhang Sumin, E-mail: suminzhang@163.com E-mail:suminzhang@163.com
  • 作者简介:Zhang Sumin, E-mail: suminzhang@163.com
  • 基金资助:
    Sample DRL training and demo sequences are provided as supplementary material for the review process. The URL are directly input below.
    Agent driving without traffic participants:https://youtu.be/dMMi3a_BaqU.
    Agent driving with traffic participants:https://youtu.be/gnSzw9c2TuM.

Autonomous driving in the uncertain traffic -- a deep reinforcement learning approach

Yang Shun, Wu Jian, Zhang Sumin, Han Wei   

  1. 1. State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China
    2. Institute of Microelectronics, Chinese Academy of Sciences, Beijing 100029, China
  • Received:2018-10-23 Revised:2018-12-27 Online:2018-12-30 Published:2019-02-26
  • Contact: Zhang Sumin, E-mail: suminzhang@163.com E-mail:suminzhang@163.com
  • About author:Zhang Sumin, E-mail: suminzhang@163.com
  • Supported by:
    Sample DRL training and demo sequences are provided as supplementary material for the review process. The URL are directly input below.
    Agent driving without traffic participants:https://youtu.be/dMMi3a_BaqU.
    Agent driving with traffic participants:https://youtu.be/gnSzw9c2TuM.

摘要: Driving in the complex traffic safely and efficiently is a difficult task for autonomous vehicle because of the stochastic characteristics of engaged human drivers. Deep reinforcement learning (DRL), which combines the abstract representation capability of deep learning (DL) and the optimal decision making and control capability of reinforcement learning (RL), is a good approach to address this problem. Traffic environment is built up by combining intelligent driver model (IDM) and lane-change model as behavioral model for vehicles. To increase the stochastic of the established traffic environment, tricks such as defining a speed distribution with cutoff for traffic cars and using various politeness factors to represent distinguished lane-change style, are taken. For training an
artificial agent to achieve successful strategies that lead to the greatest long-term rewards and sophisticated maneuver, deep deterministic policy gradient (DDPG) algorithm is deployed for learning. Reward function is designed to get a trade-off between the vehicle speed, stability and driving safety. Results show that the proposed approach can achieve good autonomous maneuvering in a scenario of complex traffic behavior through interaction with the environment.

关键词: autonomous driving, complex traffic scenario, DRL, DDPG

Abstract: Driving in the complex traffic safely and efficiently is a difficult task for autonomous vehicle because of the stochastic characteristics of engaged human drivers. Deep reinforcement learning (DRL), which combines the abstract representation capability of deep learning (DL) and the optimal decision making and control capability of reinforcement learning (RL), is a good approach to address this problem. Traffic environment is built up by combining intelligent driver model (IDM) and lane-change model as behavioral model for vehicles. To increase the stochastic of the established traffic environment, tricks such as defining a speed distribution with cutoff for traffic cars and using various politeness factors to represent distinguished lane-change style, are taken. For training an
artificial agent to achieve successful strategies that lead to the greatest long-term rewards and sophisticated maneuver, deep deterministic policy gradient (DDPG) algorithm is deployed for learning. Reward function is designed to get a trade-off between the vehicle speed, stability and driving safety. Results show that the proposed approach can achieve good autonomous maneuvering in a scenario of complex traffic behavior through interaction with the environment.

Key words: autonomous driving, complex traffic scenario, DRL, DDPG

中图分类号: