Automatic Speech Recognition

Description

Automatic speech recognition is the problem of generating accurate written transcriptions for spoken utterances using computer programs. Many of the algorithms and techniques presented in the papers referenced here were introduced for the design of real-time large-vocabulary speech recognition systems at AT&T Bell Labs, or later at AT&T Labs - Research. These algorithms and techniques and, more broadly, the mathematical framework described, are now adopted by most major large-vocabulary speech recognition systems.

Here is a screenshot of a real-time Broadcast News speech recognition system demonstration based on these algorithms and techniques.

VLVR demo


Related Publications
Mehryar Mohri.
Statistical Natural Language Processing.
In M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press, 2005.

Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley.
Speech recognition with weighted finite-state transducers.
In Larry Rabiner and Fred Juang, editors, Handbook on Speech Processing and Speech Communication, Part E: Speech recognition. Springer-Verlag, Heidelberg, Germany, 2008.

Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley.
Weighted Finite-State Transducers in Speech Recognition.
Computer Speech and Language, 16(1):69-88, 2002.

Cyril Allauzen, Mehryar Mohri, Brian Roark, and Michael Riley.
A Generalized Construction of Integrated Speech Recognition Transducers.
In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004). Montréal, Canada, May 2004.

Cyril Allauzen and Mehryar Mohri.
Generalized Optimization Algorithm for Speech Recognition Transducers.
In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003). Hong Kong, April 2003.

Mehryar Mohri and Michael Riley.
A Weight Pushing Algorithm for Large Vocabulary Speech Recognition.
In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech '01). Aalborg, Denmark, September 2001.

Mehryar Mohri and Michael Riley.
Integrated Context-Dependent Networks in Very Large Vocabulary Speech Recognition.
In Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech '99). Budapest, Hungary, 1999.

Mehryar Mohri and Michael Riley.
Network Optimizations for Large Vocabulary Speech Recognition.
Speech Communication, 28(1):1-12, 1999.

Mehryar Mohri, Michael Riley, Don Hindle, Andrej Ljolje, and Fernando C. N. Pereira.
Full Expansion of Context-Dependent Networks in Large Vocabulary Speech Recognition.
In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98). Seattle, Washington, 1998.

Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley.
Weighted Automata in Text and Speech Processing.
In Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language. Budapest, Hungary, 1996. John Wiley and Sons, Chichester.