Abstract
The performance of automatic speech recognition highly depends upon the speaker’s intelligibility and is affected by speech intensity and rate. Lombard reflex is an auditory feedback mechanism which is encountered when speakers spontaneously increase their voice in a noisy environment. We studied the feasibility of employing Lombard reflex to improve speech recognition without the speaker’s conscious awareness of the process. Whereas previous studied employed noises to produce this reflex, which may be unpleasant to the speakers, we studied the effects of music-induced Lombard reflex. Twenty speakers were recorded when listening to two music types: a rhythmic dance music or a calm yoga music, as well as to white noise, metronome sound and silence, and the differences in the speakers’ speech rate and intensity while listening to the different sounds were compared. Several cohort trends were observed: Speech intensity was particularly stronger in the rhythmic dance music condition for most subjects. This change was not observed for the metronome sound which had a similar rhythm. Speech rate was decreased for the yoga music condition for female speakers only. An examination of the changes in these prosodic variables for individual speakers yielded that most of them exhibited an increase in speech power and/or a decrease in speaking rate for at least one of the music types. This effect, when further explored, may be implemented in a personalized speech recognition engine, to enhance the usability of voice commands, dictation, and other speech based applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
McCreery, R.W., Stelmachowicz, P.G.: Audibility-based predictions of speech recognition for children and adults with normal hearing. J. Acoust. Soc. Am. 130(6), 4070–4081 (2011)
Bradlow, R., Torretta, G.M., Pisoni, D.B.: Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Commun. 20(3–4), 255–272 (1996)
Egan, J.J.: The Lombard reflex: historical perspective. Arch. Otolaryngol. 94(4), 310–312 (1971)
Brumm, H., Zollinger, S.A.: The evolution of the Lombard effect: 100 years of psychoacoustic research. Behaviour 148(11–13), 1173–1198 (2011)
Junqua, J.C.: The Lombard reflex and its role on human listeners and automatic speech recognizers. J. Acoust. Soc. Am. 93(1), 510–524 (1993)
Zhao, Y., Jurafsky, D.: The effect of lexical frequency and Lombard reflex on tone hyperarticulation. J. Phon. 37(2), 231–247 (2009)
Junqua, J.-C., Fincke, S., Field, K.: Influence of the speaking style and the noise spectral tilt on the Lombard reflex and automatic speech recognition. In: Fifth International Conference on Spoken Language Processing, pp. 467–470 (1998)
Junqua, J.-C., Fincke, S., Field, K.: The Lombard effect: a reflex to better communicate with others in noise. In: Proceedings of the IEEE Acoustics, Speech and Signal Processing, pp. 2083–2086 (1999)
Junqua, J.-C.: Impact of the unknown communication channel on automatic speech recognition: a review. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Rhodes, Greece, vol. 1, pp. KN29–KN32 (1997)
Vlaj, D., Kacic, Z.: The influence of Lombard effect on speech recognition. Speech Technologies: InTech, June 2011
Lane, H., Tranel, B.: The Lombard sign and the role of hearing in speech. J. Speech, Lang. Hearing Res. 14(4), 677–709 (1971)
Aharonson, V., Aharonson, E., Raichlin-Levi, K., Sotzianu, A., Amir, O., Ovadia-Blechman, Z.: A real-time phoneme counting algorithm and application for speech rate monitoring. J. Fluen. Disord. 51, 60–68 (2017)
Acknowledgments
We thank Mr. Molefi Makuebu. University of the Witwatersrand, Johannesburg for his help in the signal preprocessing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Aharonson, V., Mualem, S., Aharonson, E. (2019). Harnessing Music to Enhance Speech Recognition. In: Ahram, T., Falcão, C. (eds) Advances in Usability, User Experience and Assistive Technology. AHFE 2018. Advances in Intelligent Systems and Computing, vol 794. Springer, Cham. https://doi.org/10.1007/978-3-319-94947-5_39
Download citation
DOI: https://doi.org/10.1007/978-3-319-94947-5_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94946-8
Online ISBN: 978-3-319-94947-5
eBook Packages: EngineeringEngineering (R0)