Skip to main content

Automating Vehicles by Deep Reinforcement Learning Using Task Separation with Hill Climbing

  • Conference paper
  • First Online:
Advances in Information and Communication (FICC 2019)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 70))

Included in the following conference series:

  • 1583 Accesses

Abstract

Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.

This work was mostly done while MGP was with IMT School for Advanced Studies Lucca, Piazza S. Francesco 19, 55100 Lucca, Italy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In this setting, mean and variance of the Gaussian distribution are the output of a neural network whose parameters are summarized by lumped \(\theta \).

  2. 2.

    It is mentioned that typically the first, for example, 50000 samples are collected without parameter update. However, even then that threshold must be selected, and the fundamental problem still perseveres.

References

  1. Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch sgd: training resnet-50 on imagenet in 15 minutes. arXiv:1711.04325 (2017)

  2. Anderson, C.W.: Learning to control an inverted pendulum using neural networks. IEEE Control. Syst. Mag. 9(3), 31–37 (1989)

    Article  Google Scholar 

  3. Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016)

  4. Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning, pp. 41–48. ACM (2009)

    Google Scholar 

  5. Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., et al.: End to end learning for self-driving cars. arXiv:1604.07316 (2016)

  6. Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)

    Google Scholar 

  7. Chen, S., Zhang, S., Shang, J., Chen, B., Zheng, N.: Brain inspired cognitive model with attention for self-driving cars. arXiv:1702.05596 (2017 )

  8. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation arXiv:1406.1078 (2014)

  9. Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J.: Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res. 29(5), 485–501 (2010)

    Article  Google Scholar 

  10. Falcone, P., Borrelli, F., Asgari, J., Tseng, H.E., Hrovat, D.: Predictive active steering control for autonomous vehicle systems. IEEE Trans. Control. Syst. Technol. 15(3), 566–580 (2007)

    Article  Google Scholar 

  11. Frazzoli, E., Dahleh, M.A., Feron, E.: A hybrid control architecture for aggressive maneuvering of autonomous helicopters. IEEE Conf. Decis. Control. 3, 2471–2476 (1999)

    MATH  Google Scholar 

  12. Fu, M.C., Glover, F.W., April, J.: Simulation optimization: a review, new developments, and applications. In: IEEE Winter Simulation Conference, pp. 13–pp. IEEE (2005)

    Google Scholar 

  13. Geering, H.P., Dondi, G., Herzog, F., Keel, S.: Stochastic systems. Course script (2011)

    Google Scholar 

  14. Gers, F.A., Schraudolph, N.N., Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)

    Google Scholar 

  15. Gillespie, T.D.: Vehicle dynamics. Warren Dale (1997)

    Google Scholar 

  16. Glasmachers, T.: Limits of end-to-end learning. arXiv:1704.08305 (2017)

  17. Gray, A., Gao, Y., Lin, T., Hedrick, J.K., Tseng, H.E., Borrelli, F.: Predictive control for agile semi-autonomous ground vehicles using motion primitives. In: IEEE American Control Conference, pp. 4239–4244 (2012)

    Google Scholar 

  18. Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y. et al.: Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)

  19. Hong, L.J., Nelson, B.L.: A brief introduction to optimization via simulation. In: IEEE Winter Simulation Conference, pp. 75–85 (2009)

    Google Scholar 

  20. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  21. Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning, pp. 2342–2350 (2015)

    Google Scholar 

  22. Karaman, S., Walter, M.R., Perez, A., Frazzoli, E., Teller, S.: Anytime motion planning using the RRT. In: IEEE Conference on Robotics and Automation, pp. 1478–1483 (2011)

    Google Scholar 

  23. Koutník, J., Schmidhuber, J., Gomez, F.: Online evolution of deep convolutional network for vision-based reinforcement learning. In: International Conference on Simulation of Adaptive Behavior, pp. 260–269. Springer (2014)

    Google Scholar 

  24. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

  25. Liniger, A., Domahidi, A., Morari, M.: Optimization-based autonomous racing of 1: 43 scale rc cars. Optim. Control. Appl. Methods 36(5), 628–647 (2015)

    Article  MathSciNet  Google Scholar 

  26. Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)

    Article  MathSciNet  Google Scholar 

  27. McNaughton, M., Urmson, C., Dolan, J.M., Lee, J.-W.: Motion planning for autonomous driving with a conformal spatiotemporal lattice. In: IEEE Conference on Robotics and Automation, pp. 4889–4895 (2011)

    Google Scholar 

  28. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, Ti., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  29. National Highway Traffic Safety Administration. Traffic safety facts, 2014: a compilation of motor vehicle crash data from the fatality analysis reporting system and the general estimates system. dot hs 812261. Department of Transportation, Washington, DC (2014)

    Google Scholar 

  30. Nvidia. Tesla P100. https://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf (2016)

  31. Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)

    Article  Google Scholar 

  32. Paxton, C., Raman, V., Hager, G.D., Kobilarov, M.: Combining neural networks and tree search for task and motion planning in challenging environments. arXiv:1703.07887 (2017)

  33. Plessen, M.G.: Trajectory planning of automated vehicles in tube-like road segments. In: IEEE Conference on Intelligent Transportation Systems, pp. 83–88 (2017)

    Google Scholar 

  34. Plessen, M.G., Bernardini, D., Esen, H., Bemporad, A.: Multi-automated vehicle coordination using decoupled prioritized path planning for multi-lane one-and bi-directional traffic flow control. In: IEEE Conference on Decision and Control, pp. 1582–1588 (2016)

    Google Scholar 

  35. Plessen, M.G., Bernardini, D., Esen, H., Bemporad, A.: Spatial-based predictive control and geometric corridor planning for adaptive cruise control coupled with obstacle avoidance. IEEE Trans. Control. Syst, Technol (2017)

    Google Scholar 

  36. Plessen, M.G., Lima, P.F., Mårtensson, J., Bemporad, A., Wahlberg, B.: Trajectory planning under vehicle dimension constraints using sequential linear programming. In: IEEE Conference on Intelligent Transportation Systems, pp. 108–113 (2017)

    Google Scholar 

  37. Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: Advances in Neural Information Processing Systems, pp. 305–313 (1989)

    Google Scholar 

  38. Rajamani, R.: Vehicle Dynamics and Control. Springer Science & Business Media (2011)

    Google Scholar 

  39. Randlov, J., Alstrom, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: International Conference on Machine Learning, pp. 463–471 (1998)

    Google Scholar 

  40. Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)

  41. Schouwenaars, T., Mettler, B., Feron, E., How, J.P.: Robust motion planning using a maneuver automation with built-in uncertainties. IEEE Am. Control. Conf. 3, 2211–2216 (2003)

    Google Scholar 

  42. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)

  43. Siegelmann, H.T., Sontag, E.D.: Turing computability with neural nets. Appl. Math. Lett. 4(6), 77–80 (1991)

    Article  MathSciNet  Google Scholar 

  44. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395 (2014)

    Google Scholar 

  45. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press Cambridge (1998)

    Google Scholar 

  46. Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)

    Google Scholar 

  47. Tedrake, R., Manchester, I.R., Tobenkin, M., Roberts, J.W.: LQR-trees: feedback motion planning via sums-of-squares verification. Int. J. Robot. Res. 29(8), 1038–1052 (2010)

    Article  Google Scholar 

  48. Urmson, C., Anhalt, J., Bagnell, D., Baker, C., Bittner, R., Clark, M.N., Dolan, J., et al.: Autonomous driving in urban environments: boss and the urban challenge. J. Field Robot. 25(8), 425–466 (2008)

    Article  Google Scholar 

  49. Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. J. Mach. Learn. Res. 15(1), 949–980 (2014)

    MathSciNet  MATH  Google Scholar 

  50. Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv:1612.01079 (2016)

  51. Xu, J., Nelson, B.L., Hong, J.: Industrial strength COMPASS: a comprehensive algorithm and software for optimization via simulation. ACM Trans. Model. Comput. Simul. 20(1), 3 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mogens Graf Plessen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Graf Plessen, M. (2020). Automating Vehicles by Deep Reinforcement Learning Using Task Separation with Hill Climbing. In: Arai, K., Bhatia, R. (eds) Advances in Information and Communication. FICC 2019. Lecture Notes in Networks and Systems, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-030-12385-7_16

Download citation

Publish with us

Policies and ethics