Abstract
Within the context of autonomous driving a model-based reinforcement learning algorithm is proposed for the design of neural network-parameterized controllers. Classical model-based control methods, which include sampling- and lattice-based algorithms and model predictive control, suffer from the trade-off between model complexity and computational burden required for the online solution of expensive optimization or search problems at every short sampling time. To circumvent this trade-off, a 2-step procedure is motivated: first learning of a controller during offline training based on an arbitrarily complicated mathematical system model, before online fast feedforward evaluation of the trained controller. The contribution of this paper is the proposition of a simple gradient-free and model-based algorithm for deep reinforcement learning using task separation with hill climbing (TSHC). In particular, (i) simultaneous training on separate deterministic tasks with the purpose of encoding many motion primitives in a neural network, and (ii) the employment of maximally sparse rewards in combination with virtual velocity constraints (VVCs) in setpoint proximity are advocated.
This work was mostly done while MGP was with IMT School for Advanced Studies Lucca, Piazza S. Francesco 19, 55100 Lucca, Italy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this setting, mean and variance of the Gaussian distribution are the output of a neural network whose parameters are summarized by lumped \(\theta \).
- 2.
It is mentioned that typically the first, for example, 50000 samples are collected without parameter update. However, even then that threshold must be selected, and the fundamental problem still perseveres.
References
Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch sgd: training resnet-50 on imagenet in 15 minutes. arXiv:1711.04325 (2017)
Anderson, C.W.: Learning to control an inverted pendulum using neural networks. IEEE Control. Syst. Mag. 9(3), 31–37 (1989)
Ba, J.L., Kiros, J.R., Hinton, G.E.: Layer normalization. arXiv:1607.06450 (2016)
Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: International Conference on Machine Learning, pp. 41–48. ACM (2009)
Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., et al.: End to end learning for self-driving cars. arXiv:1604.07316 (2016)
Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Chen, S., Zhang, S., Shang, J., Chen, B., Zheng, N.: Brain inspired cognitive model with attention for self-driving cars. arXiv:1702.05596 (2017 )
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation arXiv:1406.1078 (2014)
Dolgov, D., Thrun, S., Montemerlo, M., Diebel, J.: Path planning for autonomous vehicles in unknown semi-structured environments. Int. J. Robot. Res. 29(5), 485–501 (2010)
Falcone, P., Borrelli, F., Asgari, J., Tseng, H.E., Hrovat, D.: Predictive active steering control for autonomous vehicle systems. IEEE Trans. Control. Syst. Technol. 15(3), 566–580 (2007)
Frazzoli, E., Dahleh, M.A., Feron, E.: A hybrid control architecture for aggressive maneuvering of autonomous helicopters. IEEE Conf. Decis. Control. 3, 2471–2476 (1999)
Fu, M.C., Glover, F.W., April, J.: Simulation optimization: a review, new developments, and applications. In: IEEE Winter Simulation Conference, pp. 13–pp. IEEE (2005)
Geering, H.P., Dondi, G., Herzog, F., Keel, S.: Stochastic systems. Course script (2011)
Gers, F.A., Schraudolph, N.N., Schmidhuber, J. Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3(Aug), 115–143 (2002)
Gillespie, T.D.: Vehicle dynamics. Warren Dale (1997)
Glasmachers, T.: Limits of end-to-end learning. arXiv:1704.08305 (2017)
Gray, A., Gao, Y., Lin, T., Hedrick, J.K., Tseng, H.E., Borrelli, F.: Predictive control for agile semi-autonomous ground vehicles using motion primitives. In: IEEE American Control Conference, pp. 4239–4244 (2012)
Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y. et al.: Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)
Hong, L.J., Nelson, B.L.: A brief introduction to optimization via simulation. In: IEEE Winter Simulation Conference, pp. 75–85 (2009)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning, pp. 2342–2350 (2015)
Karaman, S., Walter, M.R., Perez, A., Frazzoli, E., Teller, S.: Anytime motion planning using the RRT. In: IEEE Conference on Robotics and Automation, pp. 1478–1483 (2011)
KoutnÃk, J., Schmidhuber, J., Gomez, F.: Online evolution of deep convolutional network for vision-based reinforcement learning. In: International Conference on Simulation of Adaptive Behavior, pp. 260–269. Springer (2014)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
Liniger, A., Domahidi, A., Morari, M.: Optimization-based autonomous racing of 1: 43 scale rc cars. Optim. Control. Appl. Methods 36(5), 628–647 (2015)
Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)
McNaughton, M., Urmson, C., Dolan, J.M., Lee, J.-W.: Motion planning for autonomous driving with a conformal spatiotemporal lattice. In: IEEE Conference on Robotics and Automation, pp. 4889–4895 (2011)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, Ti., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
National Highway Traffic Safety Administration. Traffic safety facts, 2014: a compilation of motor vehicle crash data from the fatality analysis reporting system and the general estimates system. dot hs 812261. Department of Transportation, Washington, DC (2014)
Nvidia. Tesla P100. https://images.nvidia.com/content/tesla/pdf/nvidia-tesla-p100-PCIe-datasheet.pdf (2016)
Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)
Paxton, C., Raman, V., Hager, G.D., Kobilarov, M.: Combining neural networks and tree search for task and motion planning in challenging environments. arXiv:1703.07887 (2017)
Plessen, M.G.: Trajectory planning of automated vehicles in tube-like road segments. In: IEEE Conference on Intelligent Transportation Systems, pp. 83–88 (2017)
Plessen, M.G., Bernardini, D., Esen, H., Bemporad, A.: Multi-automated vehicle coordination using decoupled prioritized path planning for multi-lane one-and bi-directional traffic flow control. In: IEEE Conference on Decision and Control, pp. 1582–1588 (2016)
Plessen, M.G., Bernardini, D., Esen, H., Bemporad, A.: Spatial-based predictive control and geometric corridor planning for adaptive cruise control coupled with obstacle avoidance. IEEE Trans. Control. Syst, Technol (2017)
Plessen, M.G., Lima, P.F., Mårtensson, J., Bemporad, A., Wahlberg, B.: Trajectory planning under vehicle dimension constraints using sequential linear programming. In: IEEE Conference on Intelligent Transportation Systems, pp. 108–113 (2017)
Pomerleau, D.A.: ALVINN: an autonomous land vehicle in a neural network. In: Advances in Neural Information Processing Systems, pp. 305–313 (1989)
Rajamani, R.: Vehicle Dynamics and Control. Springer Science & Business Media (2011)
Randlov, J., Alstrom, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: International Conference on Machine Learning, pp. 463–471 (1998)
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv:1703.03864 (2017)
Schouwenaars, T., Mettler, B., Feron, E., How, J.P.: Robust motion planning using a maneuver automation with built-in uncertainties. IEEE Am. Control. Conf. 3, 2211–2216 (2003)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
Siegelmann, H.T., Sontag, E.D.: Turing computability with neural nets. Appl. Math. Lett. 4(6), 77–80 (1991)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395 (2014)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT press Cambridge (1998)
Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)
Tedrake, R., Manchester, I.R., Tobenkin, M., Roberts, J.W.: LQR-trees: feedback motion planning via sums-of-squares verification. Int. J. Robot. Res. 29(8), 1038–1052 (2010)
Urmson, C., Anhalt, J., Bagnell, D., Baker, C., Bittner, R., Clark, M.N., Dolan, J., et al.: Autonomous driving in urban environments: boss and the urban challenge. J. Field Robot. 25(8), 425–466 (2008)
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., Schmidhuber, J.: Natural evolution strategies. J. Mach. Learn. Res. 15(1), 949–980 (2014)
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. arXiv:1612.01079 (2016)
Xu, J., Nelson, B.L., Hong, J.: Industrial strength COMPASS: a comprehensive algorithm and software for optimization via simulation. ACM Trans. Model. Comput. Simul. 20(1), 3 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Graf Plessen, M. (2020). Automating Vehicles by Deep Reinforcement Learning Using Task Separation with Hill Climbing. In: Arai, K., Bhatia, R. (eds) Advances in Information and Communication. FICC 2019. Lecture Notes in Networks and Systems, vol 70. Springer, Cham. https://doi.org/10.1007/978-3-030-12385-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-12385-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-12384-0
Online ISBN: 978-3-030-12385-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)