Skip to main content
Log in

Motion planning for robot audition

  • Published:
Autonomous Robots Aims and scope Submit manuscript

Abstract

Robot audition refers to a range of hearing capabilities which help robots explore and understand their environment. Among them, sound source localization is the problem of estimating the location of a sound source given measurements of its angle of arrival with respect to a microphone array mounted on the robot. In addition, robot motion can help quickly solve the front-back ambiguity existing in a linear microphone array. In this article, we focus on the problem of exploiting robot motion to improve the estimation of the location of an intermittent and possibly moving source in a noisy and reverberant environment. We first propose a robust extended mixture Kalman filtering framework for jointly estimating the source location and its activity over time. Building on this framework, we then propose a long-term robot motion planning algorithm based on Monte Carlo tree search to find an optimal robot trajectory according to two alternative criteria: the Shannon entropy or the standard deviation of the estimated belief on the source location. These criteria are integrated over time using a discount factor. Experimental results show the robustness of the proposed estimation framework to false angle of arrival measurements within \(\pm \,20^{\circ }\) and 10% false source activity detection rate. The proposed robot motion planning technique achieves an average localization error 48.7% smaller than a one-step-ahead method. In addition, we compare the correlation between the estimation error and the two criteria, and investigate the effect of the discount factor on the performance of the proposed motion planning algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

References

  • Alam, J., Kenny, P., Ouellet, P., Stafylakis, T., & Dumouchel, P. (2014). Supervised/unsupervised voice activity detectors for text-dependent speaker recognition on the RSR2015 corpus. In Proceedings of Odyssey.

  • Ali, A. M., Asgari, S., Collier, T. C., Allen, M., Girod, L., Hudson, R. E., et al. (2009). An empirical study of collaborative acoustic source localization. Journal of Signal Processing Systems, 57(3), 415–436.

    Article  Google Scholar 

  • Allen, J. B., & Berkley, D. A. (1979). Image method for efficiently simulating small-room acoustics. The Journal of the Acoustical Society of America, 65(4), 943–950.

    Article  Google Scholar 

  • Amanatiadis, A. A., Chatzichristofis, S. A., Charalampous, K., Doitsidis, L., Kosmatopoulos, E. B., Tsalides, P., et al. (2013). A multi-objective exploration strategy for mobile robots under operational constraints. IEEE Access, 1, 691–702.

    Article  Google Scholar 

  • Badali, A., Valin, J. M., Michaud, F., & Aarabi, P. (2009). Evaluating real-time audio localization algorithms for artificial audition in robotics. In Proceedings of the IROS (pp. 2033–2038).

  • Berglund, E., & Sitte, J. (2005). Sound source localisation through active audition. In Proceedings of the IROS (pp. 509–514).

  • Bhattacharyya, S. (2011). Motion planning and constraint exploration for robotic surgery. Nashville: Vanderbilt University.

    Google Scholar 

  • Blandin, C., Ozerov, A., & Vincent, E. (2012). Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing, 92(8), 1950–1960.

    Article  Google Scholar 

  • Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., et al. (2012). A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), 1–43.

    Article  Google Scholar 

  • Bustamante, G., & Danès, P. (2017). Multi-step-ahead information-based feedback control for active binaural localization. In Proceedings of the IROS.

  • Bustamante, G., Danès, P., Forgue, T., & Podlubne, A. (2016). Towards information-based feedback control for binaural active localization. In Proceedings of the ICASSP (pp. 6325–6329).

  • Bustamante, G., Danès, P., Forgue, T., Podlubne, A., & Manhès, J. (2017). An information based feedback control for audio-motor binaural localization. Autonomous Robots,. https://doi.org/10.1007/s10514-017-9639-8.

    Article  Google Scholar 

  • Chengalvarayan, R. (1999). Robust energy normalization using speech/nonspeech discriminator for German connected digit recognition. In Proceedings of the Eurospeech.

  • Colas, F., Mahesh, S., Pomerleau, F., Liu, M., & Siegwart, R. (2013). 3D path planning and execution for search and rescue ground robots. In Proceedings of the IROS (pp. 722–727).

  • Cooke, M., Lu, Y. C., Lu, Y., & Horaud, R. (2007). Active hearing, active speaking. In Proceedings of the ISAAR (pp. 33–46).

  • DeJong, B. P. (2012). Auditory occupancy grids with a mobile robot. Journal of Automation, Mobile Robotics and Intelligent Systems, 6(3), 3–12.

    Google Scholar 

  • DiBiase, J. H., Silverman, H. F., & Brandstein, M. S. (2001). Robust localisation in reverberant rooms. In M. Brandstein & D. Ward (Eds.), Microphone arrays: Signal processing techniques and applications (pp. 157–180). Berlin: Springer.

    Chapter  Google Scholar 

  • Dolgov, D., Thrun, S., Montemerlo, M., & Diebel, J. (2008). Practical search techniques in path planning for autonomous driving. In Proceedings of the STAIR.

  • Evers, C., Moore, A., & Naylor, P. (2016). Towards informative path planning for acoustic SLAM. In Proceedings of the DAGA.

  • Fallon, M. F., & Godsill, S. J. (2012). Acoustic source localization and tracking of a time-varying number of speakers. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1409–1415.

    Article  Google Scholar 

  • Germain, F. G., Sun, D. L., & Mysore, G. J. (2013). Speaker and noise independent voice activity detection. In: Proceedings of the Interspeech.

  • Girod, L., Lukac, M., Trifa, V., & Estrin, D. (2006). The design and implementation of a self-calibrating distributed acoustic sensing platform. In: Proceedings of the SenSys (pp. 71–84).

  • Gonzalez-Banos, H. H., & Latombe, J. C. (2002). Navigation strategies for exploring indoor environments. The International Journal of Robotics Research, 21(10–11), 829–848.

    Article  Google Scholar 

  • Hahn, W., & Tretter, S. (1973). Optimum processing for delay-vector estimation in passive signal arrays. IEEE Transactions on Information Theory, 19(5), 608–614.

    Article  Google Scholar 

  • Hashimoto, S., Narita, S., Kasahara, H., Takanishi, A., Sugano, S., Shirai, K., Kobayashi, T., Takanobu, H., Kurata, T., Fujiwara, K., Matsuno, T., Kawasaki, T., & Hoashi, K. (1997). Humanoid robot-development of an information assistant robot hadaly. In Proceedings of the RO-MAN (pp. 106–111).

  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301), 13–30.

    Article  MathSciNet  Google Scholar 

  • Huber, M. F., Bailey, T., Durrant-Whyte, H., & Hanebeck, U. D. (2008). On entropy approximation for Gaussian mixture random vectors. In: Proceedings of the MFI (pp. 181–188).

  • Johnson, D. H., & Dudgeon, D. E. (1992). Array signal processing: Concepts and techniques. New York: Simon & Schuster.

    MATH  Google Scholar 

  • Karray, L., & Martin, A. (2003). Towards improving speech detection robustness for speech recognition in adverse conditions. Speech Communication, 40(3), 261–276.

    Article  Google Scholar 

  • Kim, U. H., Kim, J., Kim, D., Kim, H., & You, B. J. (2008). Speaker localization using the TDOA-based feature matrix for a humanoid robot. In Proceedings of the RO-MAN (pp. 610–615).

  • Knapp, C., & Carter, G. (1976). The generalized cross-correlation method for estimation of time delay. IEEE Transactions on Acoustics, Speech, and Signal Processing, 24(4), 320–327.

    Article  Google Scholar 

  • Kocsis, L., Szepesvári, C., & Willemson, J. (2006). Improved Monte-Carlo search. Technical Report 1, University of Tartu.

  • Latombe, J. C. (1991). Robot motion planning. Dordrecht: Kluwer.

    Book  Google Scholar 

  • LaValle, S. M. (2006). Planning algorithms. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Lu, Y. C., & Cooke, M. (2011). Motion strategies for binaural localisation of speech sources in azimuth and distance by artificial listeners. Speech Communication, 53(5), 622–642.

    Article  Google Scholar 

  • Magassouba, A. (2016). Aural servo: Towards an alternative approach to sound localization for robot motion control. Ph.D. thesis, Université Rennes 1.

  • Marković, I., Portello, A., Danès, P., Petrović, I., & Argentieri, S. (2013). Active speaker localization with circular likelihoods and bootstrap filtering. In Proceedings of the IROS (pp. 2914–2920).

  • Martinson, E., & Schultz, A. (2006). Auditory evidence grids. In Proceedings of the IROS (pp. 1139–1144).

  • Martinson, E., & Schultz, A. (2009). Discovery of sound sources by an autonomous mobile robot. Autonomous Robots, 27, 221–237.

    Article  Google Scholar 

  • Marzinzik, M., & Kollmeier, B. (2002). Speech pause detection for noise spectrum estimation by tracking power envelope dynamics. IEEE Transactions on Speech and Audio Processing, 10(2), 109–118.

    Article  Google Scholar 

  • Nakadai, K., Lourens, T., Okuno, H. G., & Kitano, H. (2000). Active audition for humanoid. In Proceedings of the AAAI (pp. 832–839).

  • Nakadai, K., Okuno, H. G., & Kitano, H. (2002). Real-time sound source localization and separation for robot audition. In Proceedings of the Interspeech (pp. 193–196).

  • Nakadai, K., Okuno, H. G., & Kitano, H. (2003). Robot recognizes three simultaneous speech by active audition. In Proceedings of the ICRA (pp. 398–405).

  • Nakadai, K., Takahashi, T., Okuno, H. G., Nakajima, H., Hasegawa, Y., & Tsujino, H. (2010). Design and implementation of robot audition system ’HARK’—Open source software for listening to three simultaneous speakers. Advanced Robotics, 24(5–6), 739–761.

    Article  Google Scholar 

  • Nakamura, K., Nakadai, K., & Ince, G. (2012). Real-time super-resolution sound source localization for robots. In Proceedings of the IROS (pp. 694–699).

  • Nguyen, Q. V. (2018). Mapping of a sound environment by a mobile robot. Ph.D. thesis, University of Lorraine.

  • Nguyen, Q. V., Colas, F., Vincent, E., & Charpillet, F. (2016). Localizing an intermittent and moving sound source using a mobile robot. In Proceedings of the IROS (pp. 61–65).

  • Nguyen, Q. V., Colas, F., Vincent, E., & Charpillet, F. (2017). Long-term robot motion planning for active sound source localization with Monte Carlo tree search. In Proceedings of the HSCMA (pp 61–65).

  • Okuno, H. G., & Nakadai, K. (2015). Robot audition: Its rise and perspectives. In Proceedings of the ICASSP (pp. 5610–5614).

  • Popoviciu, T. (1935). Sur les équations algébriques ayant toutes leurs racines réelles. Mathematica (Cluj), 9, 129–145.

    MATH  Google Scholar 

  • Portello, A., Bustamante, G., Danès, P., Piat, J., & Manhès, J. (2014). Active localization of an intermittent sound source from a moving binaural sensor. In Proceedings of the Forum Acusticum.

  • Portello, A., Danès, P., & Argentieri, S. (2011). Acoustic models and Kalman filtering strategies for active binaural sound localization. In Proceedings of the IROS (pp. 137–142).

  • Portello, A., Danès, P., & Argentieri, S. (2012). Active binaural localization of intermittent moving sources in the presence of false measurements. In Proceedings of the IROS (pp. 3294–3299).

  • Ramírez, J., Górriz, J. M., & Segura, J. C. (2007). Voice activity detection Fundamentals and speech recognition system robustness. In M. Grimm & K. Kroschel (Eds.), Robust speech recognition and understanding. Vienna: Intech.

    Google Scholar 

  • Ramirez, J., Segura, J. C., Benitez, C., de la Torre, A., & Rubio, A. J. (2003). A new adaptive long-term spectral estimation voice activity detector. In Proceedings of the Eurospeech.

  • Schmidt, R. (1986). Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3), 276–280.

    Article  Google Scholar 

  • Schymura, C., Grajales, J. D. R., & Kolossa, D. (2017). Monte Carlo exploration for active binaural localization. In Proceedings of the ICASSP (pp. 491–495).

  • Siegwart, R., Nourbakhsh, I. R., & Scaramuzza, D. (2011). Introduction to autonomous mobile robots. Cambridge: MIT Press.

    Google Scholar 

  • Slotani, M. (1964). Tolerance regions for a multivariate normal population. Annals of the Institute of Statistical Mathematics, 16(1), 135–153.

    Article  MathSciNet  Google Scholar 

  • Sohn, J., Kim, N. S., & Sung, W. (1999). A statistical model-based voice activity detection. IEEE Signal Processing Letters, 6(1), 1–3.

    Article  Google Scholar 

  • Song, K., Liu, Q., & Wang, Q. (2011). Olfaction and hearing based mobile robot navigation for odor/sound source search. Sensors, 11, 2129–2154.

    Article  Google Scholar 

  • Tanyer, S. G., & Ozer, H. (2000). Voice activity detection in nonstationary noise. IEEE Transactions on Speech and Audio Processing, 8(4), 478–482.

    Article  Google Scholar 

  • Valin, J. M., Michaud, F., & Rouat, J. (2007). Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems, 55(3), 216–228.

    Article  Google Scholar 

  • Valin, J. M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., & Okuno, H. (2007). Robust recognition of simultaneous speech by a mobile robot. IEEE Transactions on Robotics, 23(4), 742–752.

    Article  Google Scholar 

  • Van Veen, B. D., & Buckley, K. M. (1988). Beamforming: A versatile approach to spatial filtering. IEEE ASSP Magazine, 5(2), 4–24.

    Article  Google Scholar 

  • Vermaak, J., & Blake, A. (2001). Nonlinear filtering for speaker tracking in noisy and reverberant environments. In Proceedings of the ICASSP (Vol. 5, pp. 3021–3024).

  • Vincent, E., Sini, A., & Charpillet, F. (2015). Audio source localization by optimal control of a mobile robot. In Proceedings of the ICASSP (pp. 5630–5634).

  • Wightman, F. L., & Kistler, D. J. (1999). Resolution of front-back ambiguity in spatial hearing by listener and source movement. The Journal of the Acoustical Society of America, 105(5), 2841–2853.

    Article  Google Scholar 

  • Woo, K. H., Yang, T. Y., Park, K. J., & Lee, C. (2000). Robust voice activity detection algorithm for estimating noise spectrum. IET Electronics Letters, 36(2), 180–181.

    Article  Google Scholar 

  • Yamauchi, B. (1997). A frontier-based approach for autonomous exploration. In Proceedings of the CIRA (pp. 146–151).

  • Zhang, X. L., & Wu, J. (2013). Deep belief networks based voice activity detection. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 697–710.

    Article  Google Scholar 

Download references

Acknowledgements

Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Quan V. Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Chernoff–Hoeffding inequality

Chernoff–Hoeffding inequality

The UCT criterion derives from the Chernoff-Hoeffding inequality which is valid for a bounded reward function (Hoeffding 1963). The Chernoff-Hoeffding inequality is stated in the theorem below.

Theorem: Let \(Y_1, Y_2, \ldots , Y_n\) be independent random variables whose values are within the range [ab]. Denote \(\mu _i = {\mathbb {E}}(Y_i)\) as their expected values, \(Y = \frac{1}{n}\sum _iY_i\) and \(\mu ={\mathbb {E}}(Y)=\frac{1}{n}\sum _i\mu _i\). Then for all \(\epsilon >0\), we have:

$$\begin{aligned} P(|Y-\mu |>\epsilon )\le 2e^{-2n\epsilon ^2/(b-a)^2}. \end{aligned}$$
(41)

For us, \(Y_i\) is the reward Q at the end of each simulation and Y is the average reward \(\frac{{\bar{Q}}(n')}{N(n')}\) at each child node in the tree. So, to satisfy the Chernoff-Hoeffding inequality, the two criteria: entropy and standard deviation must be bounded. We show that these two criteria are bounded in the following subsections.

1.1 Bounded entropy

For a scalar random variable X in the range [ab] with no other constraints, the maximum entropy distribution of X is the uniform distribution over this range. In that case the formula for calculating the maximum entropy is expressed as follows:

$$\begin{aligned} \begin{aligned} H_{\mathrm {max,X}}&= -\int ^b_a{p(X)\log {p(X)}}dX \\&= -\log {p(X)}\int ^b_a{p(X)}dX \\&= -\log {p(X)} \\&= -\log {\frac{1}{(b-a)}} \\&= \log (b-a) \\&= \log \left| \mathrm {range}_{X}\right| \end{aligned} \end{aligned}$$
(42)

The formula for computing the maximum entropy of the belief is:

$$\begin{aligned} \begin{aligned} H_{\mathrm {max}}&=\log {\left| \mathrm {range}_{x_{\mathrm {r}}}\right| } + \log {\left| \mathrm {range}_{y_{\mathrm {r}}}\right| } +\log {\left| \mathrm {range}_{\theta _{\mathrm {r}}}\right| } \\&\quad +\log {\left| \mathrm {range}_{x_{\mathrm {s}}}\right| } +\log {\left| \mathrm {range}_{y_{\mathrm {s}}}\right| }+ \log {\left| \mathrm {range}_{\theta _{\mathrm {s}}}\right| } \\&\quad +\log {\left| \mathrm {range}_{v_{\mathrm {s}}}\right| }+\log {\left| \mathrm {range}_{w_{\mathrm {s}}}\right| } \\&=20.3027. \end{aligned} \end{aligned}$$
(43)

In theory, the minimum entropy with perfect knowledge is \(-\infty \) but it is not achievable in practice. So the lower bound can be computed as follows. In order to find the minimum entropy of the estimated belief, we begin the belief propagation with perfect knowledge about the source position. In the nonlinear MKF, it is represented by one hypothesis of active source whose variance for the source position is equal to 0. After the prediction step, there will be one hypothesis of active source with a higher weight and one hypothesis of inactive source with a lower weight. The uncertainty will appear due to the process noise Q in the dynamic model. We then evaluate the uncertainty of estimation of the source location after the update step. We find the angle from the robot to the source, the distance from the robot to the source in the range from 0.18 m to 8 m, and the AoA observation such that the entropy of the belief after the update step above is minimum. As a result, the minimum entropy \(H_{\mathrm {min}}\) is \(-38.7824\) obtained for an AoA of \(176^{\circ }\), which does not suffer from the front-back ambiguity, and a distance of 0.18 m from the robot to the source.

So, the entropy is bounded upwards by the entropy of the uniform distribution on the state vector, and downwards by the entropy of the probability distribution in the case when there is no front-back ambiguity and the sound source is as close as possible to the robot (0.18 m due to the size of the robot).

1.2 Bounded standard deviation

Let a and b be respectively the lower and upper bounds on the values of any random variable with a particular probability distribution. Then, according to Popoviciu’s inequality on variances (Popoviciu 1935), its variance satisfies:

$$\begin{aligned} {\sigma ^2}\le {\frac{1}{4}}(b-a)^{2}, \end{aligned}$$
(44)

or its standard deviation is bounded as follows:

$$\begin{aligned} -{\frac{1}{2}}(b-a)\le {\sigma }\le {\frac{1}{2}}(b-a). \end{aligned}$$
(45)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, Q.V., Colas, F., Vincent, E. et al. Motion planning for robot audition. Auton Robot 43, 2293–2317 (2019). https://doi.org/10.1007/s10514-019-09880-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10514-019-09880-1

Keywords

Navigation