Skip to main content

Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models

  • Chapter
  • First Online:
Book cover Cognitive Internet of Things: Frameworks, Tools and Applications (ISAIR 2018)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 810))

Included in the following conference series:

Abstract

Human detection in crowded situations represents a challenging task in many practically relevant scenarios. In this paper we propose a passive stereo depth based human detection scheme employing a hierarchically-structured tree of learned shape templates for delineating clusters corresponding to humans. In order to enhance the specificity of the depth-based detection approach towards humans, we also incorporate a visual object recognition modality in form of a deeply-trained model. We propose a simple way to combine the depth and appearance modalities to better cope with complex effects such as heavily occluded and small-sized humans, and clutter. Obtained results are analyzed in terms of improvements and shortcomings introduced by the individual detection modalities. Our proposed combination achieves a good accuracy at a decent computational speed in difficult scenarios exhibiting crowded situations. Hence in our view, the presented concepts represent a detection scheme of practical relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Beleznai, C., Zweng, A., Netousek, T., Birchbauer, J.A.: Multi-resolution binary shape tree for efficient 2D clustering. In: 3rd IAPR Asian Conference on Pattern Recognition, pp. 569–573 (2015)

    Google Scholar 

  2. Beyer, L., Hermans, A., Linder, T., Arras, K.O., Leibe, B.: Deep person detection in 2D range data (2018). arXiv:1804.02463

  3. Bradski, G.R.: Computer vision face tracking for use in a perceptual user interface. Intel Technol. J. (Q2), 15 (1998)

    Google Scholar 

  4. Bulò, S.R., Neuhold, G., Kontschieder, P.: Loss max-pooling for semantic image segmentation. In: Proceedings of CVPR, pp. 7082–7091. IEEE Computer Society (2017)

    Google Scholar 

  5. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. PAMI 24, 603–619 (2002)

    Article  Google Scholar 

  6. Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proceedings of BMVC, pp. 91.1–91.11 (2009)

    Google Scholar 

  7. Engelmann, F., Stückler, J., Leibe, B.: Joint object pose estimation and shape reconstruction in urban street scenes using 3D shape priors. In: Proceedings of the German Conference on Pattern Recognition (GCPR) (2016)

    Google Scholar 

  8. Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2008)

    Google Scholar 

  9. Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 ( 2015)

    Google Scholar 

  10. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, ICCV 2017, pp. 2980–2988 (2017)

    Google Scholar 

  11. Humenberger, M., Zinner, C., Weber, M., Kubinger, W., Vincze, M.: A fast stereo matching algorithm suitable for embedded real-time systems. Comput. Vis. Image Underst. 114(11), 1180–1202 (2010)

    Article  Google Scholar 

  12. Krotosky, S., Trivedi, M.M.: A comparison of color and infrared stereo approaches to pedestrian detection. In: 2007 IEEE Intelligent Vehicles Symposium, pp. 81–86 (2007)

    Google Scholar 

  13. Linder, T., Arras, K.O.: Multi-model hypothesis tracking of groups of people in RGB-D data. In: 17th International Conference on Information Fusion, FUSION, pp. 1–7 (2014)

    Google Scholar 

  14. Linder, T., Breuers, S., Leibe, B., Arras, K.O.: On multi-modal people tracking from mobile platforms in very crowded and dynamic environments. IEEE International Conference on Robotics and Automation (ICRA), pp. 5512–5519 (2016)

    Google Scholar 

  15. Liu, H., Luo, J., Wu, P., Xie, S., Li, H.: People detection and tracking using RGB-D cameras for mobile robots. Int. J. Adv. Robot. Syst. 13(5), 1–8 (2016)

    Google Scholar 

  16. Lu, H., Li, Y., Chen, M., Kim, H., Serikawa, S.: Brain intelligence: go beyond artificial intelligence. Mob. Netw. Appl. (2017)

    Google Scholar 

  17. Lu, H., Li, Y., Uemura, T., Kim, H., Serikawa, S.: Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gen. Comput. Syst. 82, 142–148 (2018)

    Article  Google Scholar 

  18. Muñoz Salinas, R., Aguirre, E., García-Silvente, M.: People detection and tracking using stereo vision and color. Image Vis. Comput. 25(6), 995–1007 (2007)

    Article  Google Scholar 

  19. Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: Proceedings of International Conference on Pattern Recognition, vol 3, pp. 850–855 (2006)

    Google Scholar 

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)

    Google Scholar 

  21. Woonhyun, N., Dollár, P., Hee Han, J.: Local decorrelation for improved pedestrian detection. In: Proceedings of NIPS (2014)

    Google Scholar 

  22. Yu, F., Wang, D., Darrell, T.: Deep layer aggregation. In: Proceedings of CVPR. IEEE Computer Society (2018)

    Google Scholar 

  23. Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: Towards reaching human performance in pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 973–986 (2018)

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank both the Austrian Federal Ministry for Transport, Innovation and Technology as well as the Austrian Research Promotion Agency (FFG) for co-funding the research project “LEAL” (FFG Nr. 850218) within the National Research Development Programme KIRAS Austria.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Csaba Beleznai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Beleznai, C., Steininger, D., Broneder, E. (2020). Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models. In: Lu, H. (eds) Cognitive Internet of Things: Frameworks, Tools and Applications. ISAIR 2018. Studies in Computational Intelligence, vol 810. Springer, Cham. https://doi.org/10.1007/978-3-030-04946-1_47

Download citation

Publish with us

Policies and ethics