Skip to main content

Harnessing Music to Enhance Speech Recognition

  • Conference paper
  • First Online:
Advances in Usability, User Experience and Assistive Technology (AHFE 2018)

Abstract

The performance of automatic speech recognition highly depends upon the speaker’s intelligibility and is affected by speech intensity and rate. Lombard reflex is an auditory feedback mechanism which is encountered when speakers spontaneously increase their voice in a noisy environment. We studied the feasibility of employing Lombard reflex to improve speech recognition without the speaker’s conscious awareness of the process. Whereas previous studied employed noises to produce this reflex, which may be unpleasant to the speakers, we studied the effects of music-induced Lombard reflex. Twenty speakers were recorded when listening to two music types: a rhythmic dance music or a calm yoga music, as well as to white noise, metronome sound and silence, and the differences in the speakers’ speech rate and intensity while listening to the different sounds were compared. Several cohort trends were observed: Speech intensity was particularly stronger in the rhythmic dance music condition for most subjects. This change was not observed for the metronome sound which had a similar rhythm. Speech rate was decreased for the yoga music condition for female speakers only. An examination of the changes in these prosodic variables for individual speakers yielded that most of them exhibited an increase in speech power and/or a decrease in speaking rate for at least one of the music types. This effect, when further explored, may be implemented in a personalized speech recognition engine, to enhance the usability of voice commands, dictation, and other speech based applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. McCreery, R.W., Stelmachowicz, P.G.: Audibility-based predictions of speech recognition for children and adults with normal hearing. J. Acoust. Soc. Am. 130(6), 4070–4081 (2011)

    Article  Google Scholar 

  2. Bradlow, R., Torretta, G.M., Pisoni, D.B.: Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics. Speech Commun. 20(3–4), 255–272 (1996)

    Article  Google Scholar 

  3. Egan, J.J.: The Lombard reflex: historical perspective. Arch. Otolaryngol. 94(4), 310–312 (1971)

    Article  Google Scholar 

  4. Brumm, H., Zollinger, S.A.: The evolution of the Lombard effect: 100 years of psychoacoustic research. Behaviour 148(11–13), 1173–1198 (2011)

    Article  Google Scholar 

  5. Junqua, J.C.: The Lombard reflex and its role on human listeners and automatic speech recognizers. J. Acoust. Soc. Am. 93(1), 510–524 (1993)

    Article  Google Scholar 

  6. Zhao, Y., Jurafsky, D.: The effect of lexical frequency and Lombard reflex on tone hyperarticulation. J. Phon. 37(2), 231–247 (2009)

    Article  Google Scholar 

  7. Junqua, J.-C., Fincke, S., Field, K.: Influence of the speaking style and the noise spectral tilt on the Lombard reflex and automatic speech recognition. In: Fifth International Conference on Spoken Language Processing, pp. 467–470 (1998)

    Google Scholar 

  8. Junqua, J.-C., Fincke, S., Field, K.: The Lombard effect: a reflex to better communicate with others in noise. In: Proceedings of the IEEE Acoustics, Speech and Signal Processing, pp. 2083–2086 (1999)

    Google Scholar 

  9. Junqua, J.-C.: Impact of the unknown communication channel on automatic speech recognition: a review. In: Proceedings of the European Conference on Speech Communication and Technology (Eurospeech), Rhodes, Greece, vol. 1, pp. KN29–KN32 (1997)

    Google Scholar 

  10. Vlaj, D., Kacic, Z.: The influence of Lombard effect on speech recognition. Speech Technologies: InTech, June 2011

    Google Scholar 

  11. Lane, H., Tranel, B.: The Lombard sign and the role of hearing in speech. J. Speech, Lang. Hearing Res. 14(4), 677–709 (1971)

    Article  Google Scholar 

  12. Aharonson, V., Aharonson, E., Raichlin-Levi, K., Sotzianu, A., Amir, O., Ovadia-Blechman, Z.: A real-time phoneme counting algorithm and application for speech rate monitoring. J. Fluen. Disord. 51, 60–68 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

We thank Mr. Molefi Makuebu. University of the Witwatersrand, Johannesburg for his help in the signal preprocessing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vered Aharonson .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aharonson, V., Mualem, S., Aharonson, E. (2019). Harnessing Music to Enhance Speech Recognition. In: Ahram, T., Falcão, C. (eds) Advances in Usability, User Experience and Assistive Technology. AHFE 2018. Advances in Intelligent Systems and Computing, vol 794. Springer, Cham. https://doi.org/10.1007/978-3-319-94947-5_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94947-5_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94946-8

  • Online ISBN: 978-3-319-94947-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics