Automating Vehicles by Deep Reinforcement Learning Using Task Separation with Hill Climbing

Graf Plessen, Mogens

doi:10.1007/978-3-030-12385-7_16

Mogens Graf Plessen⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 70))

Included in the following conference series:

Future of Information and Communication Conference

1583 Accesses

Abstract

Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.

This work was mostly done while MGP was with IMT School for Advanced Studies Lucca, Piazza S. Francesco 19, 55100 Lucca, Italy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this setting, mean and variance of the Gaussian distribution are the output of a neural network whose parameters are summarized by lumped \(\theta \).
2.
It is mentioned that typically the first, for example, 50000 samples are collected without parameter update. However, even then that threshold must be selected, and the fundamental problem still perseveres.

References

Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch sgd: training resnet-50 on imagenet in 15 minutes. arXiv:1711.04325 (2017)
Anderson, C.W.: Learning to control an inverted pendulum using neural networks. IEEE Control. Syst. Mag. 9(3), 31–37 (1989)
Article Google Scholar
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning, pp. 41–48. ACM (2009)
Google Scholar
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., et al.: End to end learning for self-driving cars. arXiv:1604.07316 (2016)
Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Google Scholar
Chen, S., Zhang, S., Shang, J., Chen, B., Zheng, N.: Brain inspired cognitive model with attention for self-driving cars. arXiv:1702.05596 (2017 )
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation arXiv:1406.1078 (2014)
Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J.: Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res. 29(5), 485–501 (2010)
Article Google Scholar
Falcone, P., Borrelli, F., Asgari, J., Tseng, H.E., Hrovat, D.: Predictive active steering control for autonomous vehicle systems. IEEE Trans. Control. Syst. Technol. 15(3), 566–580 (2007)
Article Google Scholar
Frazzoli, E., Dahleh, M.A., Feron, E.: A hybrid control architecture for aggressive maneuvering of autonomous helicopters. IEEE Conf. Decis. Control. 3, 2471–2476 (1999)
MATH Google Scholar
Fu, M.C., Glover, F.W., April, J.: Simulation optimization: a review, new developments, and applications. In: IEEE Winter Simulation Conference, pp. 13–pp. IEEE (2005)
Google Scholar
Geering, H.P., Dondi, G., Herzog, F., Keel, S.: Stochastic systems. Course script (2011)
Google Scholar
Gers, F.A., Schraudolph, N.N., Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)
Google Scholar
Gillespie, T.D.: Vehicle dynamics. Warren Dale (1997)
Google Scholar
Glasmachers, T.: Limits of end-to-end learning. arXiv:1704.08305 (2017)
Gray, A., Gao, Y., Lin, T., Hedrick, J.K., Tseng, H.E., Borrelli, F.: Predictive control for agile semi-autonomous ground vehicles using motion primitives. In: IEEE American Control Conference, pp. 4239–4244 (2012)
Google Scholar
Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y. et al.: Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)
Hong, L.J., Nelson, B.L.: A brief introduction to optimization via simulation. In: IEEE Winter Simulation Conference, pp. 75–85 (2009)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning, pp. 2342–2350 (2015)
Google Scholar
Karaman, S., Walter, M.R., Perez, A., Frazzoli, E., Teller, S.: Anytime motion planning using the RRT. In: IEEE Conference on Robotics and Automation, pp. 1478–1483 (2011)
Google Scholar
Koutník, J., Schmidhuber, J., Gomez, F.: Online evolution of deep convolutional network for vision-based reinforcement learning. In: International Conference on Simulation of Adaptive Behavior, pp. 260–269. Springer (2014)
Google Scholar
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
Liniger, A., Domahidi, A., Morari, M.: Optimization-based autonomous racing of 1: 43 scale rc cars. Optim. Control. Appl. Methods 36(5), 628–647 (2015)
Article MathSciNet Google Scholar
Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)
Article MathSciNet Google Scholar
McNaughton, M., Urmson, C., Dolan, J.M., Lee, J.-W.: Motion planning for autonomous driving with a conformal spatiotemporal lattice. In: IEEE Conference on Robotics and Automation, pp. 4889–4895 (2011)
Google Scholar
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, Ti., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
National Highway Traffic Safety Administration. Traffic safety facts, 2014: a compilation of motor vehicle crash data from the fatality analysis reporting system and the general estimates system. dot hs 812261. Department of Transportation, Washington, DC (2014)
Google Scholar
Nvidia. Tesla P100. https://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf (2016)
Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Article Google Scholar
Paxton, C., Raman, V., Hager, G.D., Kobilarov, M.: Combining neural networks and tree search for task and motion planning in challenging environments. arXiv:1703.07887 (2017)
Plessen, M.G.: Trajectory planning of automated vehicles in tube-like road segments. In: IEEE Conference on Intelligent Transportation Systems, pp. 83–88 (2017)
Google Scholar
Plessen, M.G., Bernardini, D., Esen, H., Bemporad, A.: Multi-automated vehicle coordination using decoupled prioritized path planning for multi-lane one-and bi-directional traffic flow control. In: IEEE Conference on Decision and Control, pp. 1582–1588 (2016)
Google Scholar
Plessen, M.G., Bernardini, D., Esen, H., Bemporad, A.: Spatial-based predictive control and geometric corridor planning for adaptive cruise control coupled with obstacle avoidance. IEEE Trans. Control. Syst, Technol (2017)
Google Scholar
Plessen, M.G., Lima, P.F., Mårtensson, J., Bemporad, A., Wahlberg, B.: Trajectory planning under vehicle dimension constraints using sequential linear programming. In: IEEE Conference on Intelligent Transportation Systems, pp. 108–113 (2017)
Google Scholar
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: Advances in Neural Information Processing Systems, pp. 305–313 (1989)
Google Scholar
Rajamani, R.: Vehicle Dynamics and Control. Springer Science & Business Media (2011)
Google Scholar
Randlov, J., Alstrom, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: International Conference on Machine Learning, pp. 463–471 (1998)
Google Scholar
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)
Schouwenaars, T., Mettler, B., Feron, E., How, J.P.: Robust motion planning using a maneuver automation with built-in uncertainties. IEEE Am. Control. Conf. 3, 2211–2216 (2003)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Siegelmann, H.T., Sontag, E.D.: Turing computability with neural nets. Appl. Math. Lett. 4(6), 77–80 (1991)
Article MathSciNet Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395 (2014)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press Cambridge (1998)
Google Scholar
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)
Google Scholar
Tedrake, R., Manchester, I.R., Tobenkin, M., Roberts, J.W.: LQR-trees: feedback motion planning via sums-of-squares verification. Int. J. Robot. Res. 29(8), 1038–1052 (2010)
Article Google Scholar
Urmson, C., Anhalt, J., Bagnell, D., Baker, C., Bittner, R., Clark, M.N., Dolan, J., et al.: Autonomous driving in urban environments: boss and the urban challenge. J. Field Robot. 25(8), 425–466 (2008)
Article Google Scholar
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. J. Mach. Learn. Res. 15(1), 949–980 (2014)
MathSciNet MATH Google Scholar
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv:1612.01079 (2016)
Xu, J., Nelson, B.L., Hong, J.: Industrial strength COMPASS: a comprehensive algorithm and software for optimization via simulation. ACM Trans. Model. Comput. Simul. 20(1), 3 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IMT, Lucca, Italy
Mogens Graf Plessen

Authors

Mogens Graf Plessen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mogens Graf Plessen .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Graf Plessen, M. (2020). Automating Vehicles by Deep Reinforcement Learning Using Task Separation with Hill Climbing. In: Arai, K., Bhatia, R. (eds) Advances in Information and Communication. FICC 2019. Lecture Notes in Networks and Systems, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-030-12385-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-12385-7_16
Published: 02 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12384-0
Online ISBN: 978-3-030-12385-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics