Deep reinforcement learning for time optimal velocity control using prior knowledge

Gabriel Hartmann, Zvi Shiller, Amos Azaria

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Scopus citations

Abstract

Autonomous navigation has recently gained great interest in the field of reinforcement learning. However, little attention was given to the time optimal velocity control problem, i.e. controlling a vehicle such that it travels at the maximal speed without becoming dynamically unstable (roll-over or sliding). Time optimal velocity control can be solved numerically using existing methods that are based on optimal control and vehicle dynamics. In this paper, we use deep reinforcement learning to generate the time optimal velocity control. Furthermore, we use the numerical solution to further improve the performance of the reinforcement learner. It is shown that the reinforcement learner outperforms the numerically derived solution, and that the hybrid approach (combining learning with the numerical solution) speeds up the training process.

Original languageEnglish
Title of host publicationProceedings - IEEE 31st International Conference on Tools with Artificial Intelligence, ICTAI 2019
PublisherIEEE Computer Society
Pages186-193
Number of pages8
ISBN (Electronic)9781728137988
DOIs
StatePublished - Nov 2019
Event31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019 - Portland, United States
Duration: 4 Nov 20196 Nov 2019

Publication series

NameProceedings - International Conference on Tools with Artificial Intelligence, ICTAI
Volume2019-November
ISSN (Print)1082-3409

Conference

Conference31st IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2019
Country/TerritoryUnited States
CityPortland
Period4/11/196/11/19

Keywords

  • Autonomous vehicles
  • Reinforcement Learning
  • Time optimal velocity

Fingerprint

Dive into the research topics of 'Deep reinforcement learning for time optimal velocity control using prior knowledge'. Together they form a unique fingerprint.

Cite this