Harnessing Music to Enhance Speech Recognition

Aharonson, Vered; Mualem, Shany; Aharonson, Eran

doi:10.1007/978-3-319-94947-5_39

Vered Aharonson^16,17,
Shany Mualem¹⁷ &
Eran Aharonson¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 794))

Included in the following conference series:

International Conference on Applied Human Factors and Ergonomics

Abstract

The performance of automatic speech recognition highly depends upon the speaker’s intelligibility and is affected by speech intensity and rate. Lombard reflex is an auditory feedback mechanism which is encountered when speakers spontaneously increase their voice in a noisy environment. We studied the feasibility of employing Lombard reflex to improve speech recognition without the speaker’s conscious awareness of the process. Whereas previous studied employed noises to produce this reflex, which may be unpleasant to the speakers, we studied the effects of music-induced Lombard reflex. Twenty speakers were recorded when listening to two music types: a rhythmic dance music or a calm yoga music, as well as to white noise, metronome sound and silence, and the differences in the speakers’ speech rate and intensity while listening to the different sounds were compared. Several cohort trends were observed: Speech intensity was particularly stronger in the rhythmic dance music condition for most subjects. This change was not observed for the metronome sound which had a similar rhythm. Speech rate was decreased for the yoga music condition for female speakers only. An examination of the changes in these prosodic variables for individual speakers yielded that most of them exhibited an increase in speech power and/or a decrease in speaking rate for at least one of the music types. This effect, when further explored, may be implemented in a personalized speech recognition engine, to enhance the usability of voice commands, dictation, and other speech based applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

McCreery, R.W., Stelmachowicz, P.G.: Audibility-based predictions of speech recognition for children and adults with normal hearing. J. Acoust. Soc. Am. 130(6), 4070–4081 (2011)
Article Google Scholar
Bradlow, R., Torretta, G.M., Pisoni, D.B.: Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Commun. 20(3–4), 255–272 (1996)
Article Google Scholar
Egan, J.J.: The Lombard reflex: historical perspective. Arch. Otolaryngol. 94(4), 310–312 (1971)
Article Google Scholar
Brumm, H., Zollinger, S.A.: The evolution of the Lombard effect: 100 years of psychoacoustic research. Behaviour 148(11–13), 1173–1198 (2011)
Article Google Scholar
Junqua, J.C.: The Lombard reflex and its role on human listeners and automatic speech recognizers. J. Acoust. Soc. Am. 93(1), 510–524 (1993)
Article Google Scholar
Zhao, Y., Jurafsky, D.: The effect of lexical frequency and Lombard reflex on tone hyperarticulation. J. Phon. 37(2), 231–247 (2009)
Article Google Scholar
Junqua, J.-C., Fincke, S., Field, K.: Influence of the speaking style and the noise spectral tilt on the Lombard reflex and automatic speech recognition. In: Fifth International Conference on Spoken Language Processing, pp. 467–470 (1998)
Google Scholar
Junqua, J.-C., Fincke, S., Field, K.: The Lombard effect: a reflex to better communicate with others in noise. In: Proceedings of the IEEE Acoustics, Speech and Signal Processing, pp. 2083–2086 (1999)
Google Scholar
Junqua, J.-C.: Impact of the unknown communication channel on automatic speech recognition: a review. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Rhodes, Greece, vol. 1, pp. KN29–KN32 (1997)
Google Scholar
Vlaj, D., Kacic, Z.: The influence of Lombard effect on speech recognition. Speech Technologies: InTech, June 2011
Google Scholar
Lane, H., Tranel, B.: The Lombard sign and the role of hearing in speech. J. Speech, Lang. Hearing Res. 14(4), 677–709 (1971)
Article Google Scholar
Aharonson, V., Aharonson, E., Raichlin-Levi, K., Sotzianu, A., Amir, O., Ovadia-Blechman, Z.: A real-time phoneme counting algorithm and application for speech rate monitoring. J. Fluen. Disord. 51, 60–68 (2017)
Article Google Scholar

Download references

Acknowledgments

We thank Mr. Molefi Makuebu. University of the Witwatersrand, Johannesburg for his help in the signal preprocessing.

Author information

Authors and Affiliations

School of Electrical and Information Engineering, University of the Witwatersrand, Johannesburg, South Africa
Vered Aharonson
Department of Biomedical Engineering, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv, Israel
Vered Aharonson & Shany Mualem
Department of Software Engineering, Afeka, Tel Aviv Academic College of Engineering, Tel Aviv, Israel
Eran Aharonson

Authors

Vered Aharonson
View author publications
You can also search for this author in PubMed Google Scholar
Shany Mualem
View author publications
You can also search for this author in PubMed Google Scholar
Eran Aharonson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vered Aharonson .

Editor information

Editors and Affiliations

University of Central Florida, Orlando, FL, USA
Tareq Z. Ahram
Catholic University of Pernambuco, Boa Viagem, Pernambuco, Brazil
Christianne Falcão

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aharonson, V., Mualem, S., Aharonson, E. (2019). Harnessing Music to Enhance Speech Recognition. In: Ahram, T., Falcão, C. (eds) Advances in Usability, User Experience and Assistive Technology. AHFE 2018. Advances in Intelligent Systems and Computing, vol 794. Springer, Cham. https://doi.org/10.1007/978-3-319-94947-5_39

Download citation

DOI: https://doi.org/10.1007/978-3-319-94947-5_39
Published: 28 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94946-8
Online ISBN: 978-3-319-94947-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics