Motion planning for robot audition

Nguyen, Quan V.; Colas, Francis; Vincent, Emmanuel; Charpillet, François

doi:10.1007/s10514-019-09880-1

Motion planning for robot audition

Published: 31 July 2019

Volume 43, pages 2293–2317, (2019)
Cite this article

Autonomous Robots Aims and scope Submit manuscript

Quan V. Nguyen^1,2,
Francis Colas¹,
Emmanuel Vincent¹ &
…
François Charpillet¹

762 Accesses
7 Citations
Explore all metrics

Abstract

Robot audition refers to a range of hearing capabilities which help robots explore and understand their environment. Among them, sound source localization is the problem of estimating the location of a sound source given measurements of its angle of arrival with respect to a microphone array mounted on the robot. In addition, robot motion can help quickly solve the front-back ambiguity existing in a linear microphone array. In this article, we focus on the problem of exploiting robot motion to improve the estimation of the location of an intermittent and possibly moving source in a noisy and reverberant environment. We first propose a robust extended mixture Kalman filtering framework for jointly estimating the source location and its activity over time. Building on this framework, we then propose a long-term robot motion planning algorithm based on Monte Carlo tree search to find an optimal robot trajectory according to two alternative criteria: the Shannon entropy or the standard deviation of the estimated belief on the source location. These criteria are integrated over time using a discount factor. Experimental results show the robustness of the proposed estimation framework to false angle of arrival measurements within $\pm \,20^{\circ }$ and 10% false source activity detection rate. The proposed robot motion planning technique achieves an average localization error 48.7% smaller than a one-step-ahead method. In addition, we compare the correlation between the estimation error and the two criteria, and investigate the effect of the discount factor on the performance of the proposed motion planning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Robotic Auditory Perception and Approaching Methods Based on Small-sized Microphone Array

Article 21 April 2022

Zhiqing Wang, Wei Zou, … Jiagang Zhu

Auditory Feature Driven Model Predictive Control for Sound Source Approaching

Article 01 February 2024

Zhiqing Wang, Wei Zou, … Yuxin Guo

Sound-Source Tracking and Obstacle Avoidance System for the Mobile Robot

References

Alam, J., Kenny, P., Ouellet, P., Stafylakis, T., & Dumouchel, P. (2014). Supervised/unsupervised voice activity detectors for text-dependent speaker recognition on the RSR2015 corpus. In Proceedings of Odyssey.
Ali, A. M., Asgari, S., Collier, T. C., Allen, M., Girod, L., Hudson, R. E., et al. (2009). An empirical study of collaborative acoustic source localization. Journal of Signal Processing Systems, 57(3), 415–436.
Article Google Scholar
Allen, J. B., & Berkley, D. A. (1979). Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4), 943–950.
Article Google Scholar
Amanatiadis, A. A., Chatzichristofis, S. A., Charalampous, K., Doitsidis, L., Kosmatopoulos, E. B., Tsalides, P., et al. (2013). A multi-objective exploration strategy for mobile robots under operational constraints. IEEE Access, 1, 691–702.
Article Google Scholar
Badali, A., Valin, J. M., Michaud, F., & Aarabi, P. (2009). Evaluating real-time audio localization algorithms for artificial audition in robotics. In Proceedings of the IROS (pp. 2033–2038).
Berglund, E., & Sitte, J. (2005). Sound source localisation through active audition. In Proceedings of the IROS (pp. 509–514).
Bhattacharyya, S. (2011). Motion planning and constraint exploration for robotic surgery. Nashville: Vanderbilt University.
Google Scholar
Blandin, C., Ozerov, A., & Vincent, E. (2012). Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing, 92(8), 1950–1960.
Article Google Scholar
Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., et al. (2012). A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 1–43.
Article Google Scholar
Bustamante, G., & Danès, P. (2017). Multi-step-ahead information-based feedback control for active binaural localization. In Proceedings of the IROS.
Bustamante, G., Danès, P., Forgue, T., & Podlubne, A. (2016). Towards information-based feedback control for binaural active localization. In Proceedings of the ICASSP (pp. 6325–6329).
Bustamante, G., Danès, P., Forgue, T., Podlubne, A., & Manhès, J. (2017). An information based feedback control for audio-motor binaural localization. Autonomous Robots,. https://doi.org/10.1007/s10514-017-9639-8.
Article Google Scholar
Chengalvarayan, R. (1999). Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition. In Proceedings of the Eurospeech.
Colas, F., Mahesh, S., Pomerleau, F., Liu, M., & Siegwart, R. (2013). 3D path planning and execution for search and rescue ground robots. In Proceedings of the IROS (pp. 722–727).
Cooke, M., Lu, Y. C., Lu, Y., & Horaud, R. (2007). Active hearing, active speaking. In Proceedings of the ISAAR (pp. 33–46).
DeJong, B. P. (2012). Auditory occupancy grids with a mobile robot. Journal of Automation, Mobile Robotics and Intelligent Systems, 6(3), 3–12.
Google Scholar
DiBiase, J. H., Silverman, H. F., & Brandstein, M. S. (2001). Robust localisation in reverberant rooms. In M. Brandstein & D. Ward (Eds.), Microphone arrays: Signal processing techniques and applications (pp. 157–180). Berlin: Springer.
Chapter Google Scholar
Dolgov, D., Thrun, S., Montemerlo, M., & Diebel, J. (2008). Practical search techniques in path planning for autonomous driving. In Proceedings of the STAIR.
Evers, C., Moore, A., & Naylor, P. (2016). Towards informative path planning for acoustic SLAM. In Proceedings of the DAGA.
Fallon, M. F., & Godsill, S. J. (2012). Acoustic source localization and tracking of a time-varying number of speakers. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1409–1415.
Article Google Scholar
Germain, F. G., Sun, D. L., & Mysore, G. J. (2013). Speaker and noise independent voice activity detection. In: Proceedings of the Interspeech.
Girod, L., Lukac, M., Trifa, V., & Estrin, D. (2006). The design and implementation of a self-calibrating distributed acoustic sensing platform. In: Proceedings of the SenSys (pp. 71–84).
Gonzalez-Banos, H. H., & Latombe, J. C. (2002). Navigation strategies for exploring indoor environments. The International Journal of Robotics Research, 21(10–11), 829–848.
Article Google Scholar
Hahn, W., & Tretter, S. (1973). Optimum processing for delay-vector estimation in passive signal arrays. IEEE Transactions on Information Theory, 19(5), 608–614.
Article Google Scholar
Hashimoto, S., Narita, S., Kasahara, H., Takanishi, A., Sugano, S., Shirai, K., Kobayashi, T., Takanobu, H., Kurata, T., Fujiwara, K., Matsuno, T., Kawasaki, T., & Hoashi, K. (1997). Humanoid robot-development of an information assistant robot hadaly. In Proceedings of the RO-MAN (pp. 106–111).
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30.
Article MathSciNet Google Scholar
Huber, M. F., Bailey, T., Durrant-Whyte, H., & Hanebeck, U. D. (2008). On entropy approximation for Gaussian mixture random vectors. In: Proceedings of the MFI (pp. 181–188).
Johnson, D. H., & Dudgeon, D. E. (1992). Array signal processing: Concepts and techniques. New York: Simon & Schuster.
MATH Google Scholar
Karray, L., & Martin, A. (2003). Towards improving speech detection robustness for speech recognition in adverse conditions. Speech Communication, 40(3), 261–276.
Article Google Scholar
Kim, U. H., Kim, J., Kim, D., Kim, H., & You, B. J. (2008). Speaker localization using the TDOA-based feature matrix for a humanoid robot. In Proceedings of the RO-MAN (pp. 610–615).
Knapp, C., & Carter, G. (1976). The generalized cross-correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(4), 320–327.
Article Google Scholar
Kocsis, L., Szepesvári, C., & Willemson, J. (2006). Improved Monte-Carlo search. Technical Report 1, University of Tartu.
Latombe, J. C. (1991). Robot motion planning. Dordrecht: Kluwer.
Book Google Scholar
LaValle, S. M. (2006). Planning algorithms. Cambridge: Cambridge University Press.
Book Google Scholar
Lu, Y. C., & Cooke, M. (2011). Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Communication, 53(5), 622–642.
Article Google Scholar
Magassouba, A. (2016). Aural servo: Towards an alternative approach to sound localization for robot motion control. Ph.D. thesis, Université Rennes 1.
Marković, I., Portello, A., Danès, P., Petrović, I., & Argentieri, S. (2013). Active speaker localization with circular likelihoods and bootstrap filtering. In Proceedings of the IROS (pp. 2914–2920).
Martinson, E., & Schultz, A. (2006). Auditory evidence grids. In Proceedings of the IROS (pp. 1139–1144).
Martinson, E., & Schultz, A. (2009). Discovery of sound sources by an autonomous mobile robot. Autonomous Robots, 27, 221–237.
Article Google Scholar
Marzinzik, M., & Kollmeier, B. (2002). Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing, 10(2), 109–118.
Article Google Scholar
Nakadai, K., Lourens, T., Okuno, H. G., & Kitano, H. (2000). Active audition for humanoid. In Proceedings of the AAAI (pp. 832–839).
Nakadai, K., Okuno, H. G., & Kitano, H. (2002). Real-time sound source localization and separation for robot audition. In Proceedings of the Interspeech (pp. 193–196).
Nakadai, K., Okuno, H. G., & Kitano, H. (2003). Robot recognizes three simultaneous speech by active audition. In Proceedings of the ICRA (pp. 398–405).
Nakadai, K., Takahashi, T., Okuno, H. G., Nakajima, H., Hasegawa, Y., & Tsujino, H. (2010). Design and implementation of robot audition system ’HARK’—Open source software for listening to three simultaneous speakers. Advanced Robotics, 24(5–6), 739–761.
Article Google Scholar
Nakamura, K., Nakadai, K., & Ince, G. (2012). Real-time super-resolution sound source localization for robots. In Proceedings of the IROS (pp. 694–699).
Nguyen, Q. V. (2018). Mapping of a sound environment by a mobile robot. Ph.D. thesis, University of Lorraine.
Nguyen, Q. V., Colas, F., Vincent, E., & Charpillet, F. (2016). Localizing an intermittent and moving sound source using a mobile robot. In Proceedings of the IROS (pp. 61–65).
Nguyen, Q. V., Colas, F., Vincent, E., & Charpillet, F. (2017). Long-term robot motion planning for active sound source localization with Monte Carlo tree search. In Proceedings of the HSCMA (pp 61–65).
Okuno, H. G., & Nakadai, K. (2015). Robot audition: Its rise and perspectives. In Proceedings of the ICASSP (pp. 5610–5614).
Popoviciu, T. (1935). Sur les équations algébriques ayant toutes leurs racines réelles. Mathematica (Cluj), 9, 129–145.
MATH Google Scholar
Portello, A., Bustamante, G., Danès, P., Piat, J., & Manhès, J. (2014). Active localization of an intermittent sound source from a moving binaural sensor. In Proceedings of the Forum Acusticum.
Portello, A., Danès, P., & Argentieri, S. (2011). Acoustic models and Kalman filtering strategies for active binaural sound localization. In Proceedings of the IROS (pp. 137–142).
Portello, A., Danès, P., & Argentieri, S. (2012). Active binaural localization of intermittent moving sources in the presence of false measurements. In Proceedings of the IROS (pp. 3294–3299).
Ramírez, J., Górriz, J. M., & Segura, J. C. (2007). Voice activity detection Fundamentals and speech recognition system robustness. In M. Grimm & K. Kroschel (Eds.), Robust speech recognition and understanding. Vienna: Intech.
Google Scholar
Ramirez, J., Segura, J. C., Benitez, C., de la Torre, A., & Rubio, A. J. (2003). A new adaptive long-term spectral estimation voice activity detector. In Proceedings of the Eurospeech.
Schmidt, R. (1986). Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3), 276–280.
Article Google Scholar
Schymura, C., Grajales, J. D. R., & Kolossa, D. (2017). Monte Carlo exploration for active binaural localization. In Proceedings of the ICASSP (pp. 491–495).
Siegwart, R., Nourbakhsh, I. R., & Scaramuzza, D. (2011). Introduction to autonomous mobile robots. Cambridge: MIT Press.
Google Scholar
Slotani, M. (1964). Tolerance regions for a multivariate normal population. Annals of the Institute of Statistical Mathematics, 16(1), 135–153.
Article MathSciNet Google Scholar
Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.
Article Google Scholar
Song, K., Liu, Q., & Wang, Q. (2011). Olfaction and hearing based mobile robot navigation for odor/sound source search. Sensors, 11, 2129–2154.
Article Google Scholar
Tanyer, S. G., & Ozer, H. (2000). Voice activity detection in nonstationary noise. IEEE Transactions on Speech and Audio Processing, 8(4), 478–482.
Article Google Scholar
Valin, J. M., Michaud, F., & Rouat, J. (2007). Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems, 55(3), 216–228.
Article Google Scholar
Valin, J. M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., & Okuno, H. (2007). Robust recognition of simultaneous speech by a mobile robot. IEEE Transactions on Robotics, 23(4), 742–752.
Article Google Scholar
Van Veen, B. D., & Buckley, K. M. (1988). Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, 5(2), 4–24.
Article Google Scholar
Vermaak, J., & Blake, A. (2001). Nonlinear filtering for speaker tracking in noisy and reverberant environments. In Proceedings of the ICASSP (Vol. 5, pp. 3021–3024).
Vincent, E., Sini, A., & Charpillet, F. (2015). Audio source localization by optimal control of a mobile robot. In Proceedings of the ICASSP (pp. 5630–5634).
Wightman, F. L., & Kistler, D. J. (1999). Resolution of front-back ambiguity in spatial hearing by listener and source movement. The Journal of the Acoustical Society of America, 105(5), 2841–2853.
Article Google Scholar
Woo, K. H., Yang, T. Y., Park, K. J., & Lee, C. (2000). Robust voice activity detection algorithm for estimating noise spectrum. IET Electronics Letters, 36(2), 180–181.
Article Google Scholar
Yamauchi, B. (1997). A frontier-based approach for autonomous exploration. In Proceedings of the CIRA (pp. 146–151).
Zhang, X. L., & Wu, J. (2013). Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 697–710.
Article Google Scholar

Download references

Acknowledgements

Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).

Author information

Authors and Affiliations

Université de Lorraine, CNRS, Inria, Loria, 54000, Nancy, France
Quan V. Nguyen, Francis Colas, Emmanuel Vincent & François Charpillet
Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab, 38000, Grenoble, France
Quan V. Nguyen

Authors

Quan V. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Francis Colas
View author publications
You can also search for this author in PubMed Google Scholar
Emmanuel Vincent
View author publications
You can also search for this author in PubMed Google Scholar
François Charpillet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quan V. Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Chernoff–Hoeffding inequality

The UCT criterion derives from the Chernoff-Hoeffding inequality which is valid for a bounded reward function (Hoeffding 1963). The Chernoff-Hoeffding inequality is stated in the theorem below.

Theorem: Let $Y_1, Y_2, \ldots , Y_n$ be independent random variables whose values are within the range [a, b]. Denote $\mu _i = {\mathbb {E}}(Y_i)$ as their expected values, $Y = \frac{1}{n}\sum _iY_i$ and $\mu ={\mathbb {E}}(Y)=\frac{1}{n}\sum _i\mu _i$. Then for all $\epsilon >0$, we have:

$$\begin{aligned} P(|Y-\mu |>\epsilon )\le 2e^{-2n\epsilon ^2/(b-a)^2}. \end{aligned}$$

(41)

For us, $Y_i$ is the reward Q at the end of each simulation and Y is the average reward $\frac{{\bar{Q}}(n')}{N(n')}$ at each child node in the tree. So, to satisfy the Chernoff-Hoeffding inequality, the two criteria: entropy and standard deviation must be bounded. We show that these two criteria are bounded in the following subsections.

1.1 Bounded entropy

For a scalar random variable X in the range [a, b] with no other constraints, the maximum entropy distribution of X is the uniform distribution over this range. In that case the formula for calculating the maximum entropy is expressed as follows:

$$\begin{aligned} \begin{aligned} H_{\mathrm {max,X}}&= -\int ^b_a{p(X)\log {p(X)}}dX \\&= -\log {p(X)}\int ^b_a{p(X)}dX \\&= -\log {p(X)} \\&= -\log {\frac{1}{(b-a)}} \\&= \log (b-a) \\&= \log \left| \mathrm {range}_{X}\right| \end{aligned} \end{aligned}$$

(42)

The formula for computing the maximum entropy of the belief is:

$$\begin{aligned} \begin{aligned} H_{\mathrm {max}}&=\log {\left| \mathrm {range}_{x_{\mathrm {r}}}\right| } + \log {\left| \mathrm {range}_{y_{\mathrm {r}}}\right| } +\log {\left| \mathrm {range}_{\theta _{\mathrm {r}}}\right| } \\&\quad +\log {\left| \mathrm {range}_{x_{\mathrm {s}}}\right| } +\log {\left| \mathrm {range}_{y_{\mathrm {s}}}\right| }+ \log {\left| \mathrm {range}_{\theta _{\mathrm {s}}}\right| } \\&\quad +\log {\left| \mathrm {range}_{v_{\mathrm {s}}}\right| }+\log {\left| \mathrm {range}_{w_{\mathrm {s}}}\right| } \\&=20.3027. \end{aligned} \end{aligned}$$

(43)

In theory, the minimum entropy with perfect knowledge is $-\infty $ but it is not achievable in practice. So the lower bound can be computed as follows. In order to find the minimum entropy of the estimated belief, we begin the belief propagation with perfect knowledge about the source position. In the nonlinear MKF, it is represented by one hypothesis of active source whose variance for the source position is equal to 0. After the prediction step, there will be one hypothesis of active source with a higher weight and one hypothesis of inactive source with a lower weight. The uncertainty will appear due to the process noise Q in the dynamic model. We then evaluate the uncertainty of estimation of the source location after the update step. We find the angle from the robot to the source, the distance from the robot to the source in the range from 0.18 m to 8 m, and the AoA observation such that the entropy of the belief after the update step above is minimum. As a result, the minimum entropy $H_{\mathrm {min}}$ is $-38.7824$ obtained for an AoA of $176^{\circ }$, which does not suffer from the front-back ambiguity, and a distance of 0.18 m from the robot to the source.

So, the entropy is bounded upwards by the entropy of the uniform distribution on the state vector, and downwards by the entropy of the probability distribution in the case when there is no front-back ambiguity and the sound source is as close as possible to the robot (0.18 m due to the size of the robot).

1.2 Bounded standard deviation

Let a and b be respectively the lower and upper bounds on the values of any random variable with a particular probability distribution. Then, according to Popoviciu’s inequality on variances (Popoviciu 1935), its variance satisfies:

$$\begin{aligned} {\sigma ^2}\le {\frac{1}{4}}(b-a)^{2}, \end{aligned}$$

(44)

or its standard deviation is bounded as follows:

$$\begin{aligned} -{\frac{1}{2}}(b-a)\le {\sigma }\le {\frac{1}{2}}(b-a). \end{aligned}$$

(45)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, Q.V., Colas, F., Vincent, E. et al. Motion planning for robot audition. Auton Robot 43, 2293–2317 (2019). https://doi.org/10.1007/s10514-019-09880-1

Download citation

Received: 08 November 2018
Accepted: 18 July 2019
Published: 31 July 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10514-019-09880-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Motion planning for robot audition

Abstract

Access this article

Similar content being viewed by others

Practical Robotic Auditory Perception and Approaching Methods Based on Small-sized Microphone Array

Auditory Feature Driven Model Predictive Control for Sound Source Approaching

Sound-Source Tracking and Obstacle Avoidance System for the Mobile Robot

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Chernoff–Hoeffding inequality

Chernoff–Hoeffding inequality

1.1 Bounded entropy

1.2 Bounded standard deviation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation