Speech Recognition -- G22.3033-001

Speech Recognition
Course#: G22.3033-009
Instructor: Mehryar Mohri
Mailing List

Description

This course gives a computer science presentation of automatic speech recognition, the problem of transcribing accurately spoken utterances. The description includes the essential algorithms for creating large-scale speech recognition systems. The algorithms and techniques presented are now used in most research and industrial systems.

Many of the learning and search algorithms and techniques currently used in natural language processing, computational biology, and other areas of application of machine learning were originally designed for tackling speech recognition problems. Speech recognition continues to feed computer science with challenging problems, in particular because of the size of the learning and search problems it generates.

The objective of the course is thus not just to familiarize students with particular algorithms used in speech recognition, but rather use that as a basis to explore general text and speech and machine learning algorithms relevant to a variety of other areas in computer science. The course will make use of several software libraries and will study recent research and publications in this area.

Lectures

Here are some of the topics covered by this course.

Lecture 01: introduction to speech recognition, formulation, components, features.
Lecture 02: weighted transducer software library.
Lecture 03: weighted automata algorithms.
Lecture 04: statistical language modeling software library.
Lecture 05: ngram models.
Lecture 06: maximum entropy models.
Lecture 07: acoustic models, Gaussian mixture models, hidden Markov models.
Lecture 08: pronunciation models, decision trees, context-dependent models.
Lecture 09: search algorithms, transducer optimizations, Viterbi decoder.
Lecture 10: N-best algorithms, lattice generation, rescoring.
Lecture 11: structured prediction algorithms.
Lecture 12: adaptation.
Lecture 13: active learning.
Lecture 14: semi-supervised learning.

Reading and Software Material

There is no single textbook covering the material presented in this course. The following are some recommended books or papers. An extensive list of recommended papers for further reading is provided in the lecture slides.

Books

Frederick Jelinek. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1998.
Lawrence Rabiner and Biing-Hwang Juang. Fundamentals of Speech Recognition. Prentice Hall, 1993.

Papers

B. H. Juang and L. R. Rabiner. Automatic Speech Recognition - A Brief History of the Technology. Elsevier Encyclopedia of Language and Linguistics, Second Edition, 2005.
Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press, 2005.
Lawrence Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of IEEE, Vol. 77, No. 2, pp. 257, 1989.

Software

FSM Library (Finite-State Machine Library).
OpenFst Library (Finite-State Transducer Library).
GRM Library (Grammar Library).
DCD Library (Decoder Library).

Location and Time

Room 1013 Warren Weaver Hall,
251 Mercer Street.
Mondays 5:00 PM - 6:50 PM.

Prerequisite

Familiarity with basics in linear algebra, probability, and analysis of algorithms. No specific knowledge about signal processing or other engineering material is required.

Interest in theoretical and applied machine learning or prior acquaintance with machine learning concepts as presented or discussed in "Foundations of Machine Learning" or the Ph.D. seminar in machine learning, or with natural language processing will be helpful.

Coursework

3 assignments.

The standard high level of integrity is expected from all students, as with all CS courses.

Homework assignments