Cyril Allauzen
I am a Research Scientist at the Courant
Institute. Before joining Courant, I was a member of the Speech
Algorithms Department at AT&T
Labs - Research. I obtained my Ph.D. in 2001 at the Institut Gaspard-Monge
at the Université Paris-Est
Marne-la-Vallée.
Update: As of January 2008, I am now a Research Scientist at Google.
Research Interests
My topics of interest currently are:
- weighted automata and finite-state transducers (theory and algorithms),
- machine learning (kernel methods),
- natural language processing (speech recognition, speech synthesis),
- text algorithms (string matching, indexing).
Software
- Finite-State Transducer Library (OpenFst Library):
An open-source software
library for constructing, combining, optimizing, and searching weighted finite-state transducers.
- Kernel Library (OpenKernel Library):
An open-source software library for creating, combining, learning and using kernels for machine learning applications.
- Grammar Library (GRM
Library):
A general software collection for constructing and
modifying weighted automata and transducers representing weighted grammars or
statistical language models.
Publications
- [1]
- Cyril Allauzen and Michael Riley.
Bayesian language model interpolation for mobile
speech input.
In Interspeech 2011, pages 1429-1432, 2011.
(PDF, 91052 bytes)
- [2]
- Jeffrey Sorensen and Cyril Allauzen.
Unary data structures for language models.
In Interspeech 2011, pages 1425-1428, 2011.
(PDF, 244941 bytes)
- [3]
- Gonzalo Iglesias, Cyril Allauzen, William
Byrne, Adrià de Gispert, and Michael Riley.
Hierarchical phrase-based translation
representations.
In Proceedings of the 2011 Conference on Empirical Methods on Natural
Language Processing (EMNLP 2011). Association for Computational
Linguistics, 2011.
(PDF, 161045 bytes)
- [4]
- Cyril Allauzen, Mehryar Mohri, and Ashish
Rastogi.
General algorithms for testing the ambiguity of finite
automata and the double-tape ambiguity of finite-state transducers.
International Journal of Foundations of Computer Science,
22(4):883-904, 2011.
(PDF, 270824 bytes)
- [5]
- Cyril Allauzen, Corinna Cortes, and Mehryar
Mohri.
Large-scale training of svms with automata kernels.
In Implementation and Applications of Automata, 15th International
Conference, CIAA 2010, volume 6482 of Lecture Notes in Computer
Science, pages 17-27. Springer, 2011.
(PDF, 139916 bytes)
- [6]
- Cyril Allauzen, Michael Riley, and Johan
Schalkwyk.
Filters for efficient composition of weighted
finite-state transducers.
In Implementation and Applications of Automata, 15th International
Conference, CIAA 2010, volume 6482 of Lecture Notes in Computer
Science, pages 28-38. Springer, 2011.
(PDF, 189785 bytes)
- [7]
- Brandon Ballinger, Cyril Allauzen,
Alexander Gruenstein, and Johan Schalkwyk.
On-demand language model interpolation
for mobile speech input.
In Interspeech 2010, pages 1812-1815, 2010.
(PDF, 227629 bytes)
- [8]
- Cyril Allauzen, Shankar Kumar, Wolfgang
Macherey, Mehryar Mohri, and Michael Riley.
Expected sequence similarity
maximization.
In Human Language Technologies: The 2010 Annual Conference of the North
American Chapter of the Association for Computational Linguistics,
pages 957-965, Los Angeles, California, June 2010. Association for
Computational Linguistics.
- [9]
- Cyril Allauzen, Michael Riley, and Johan
Schalkwyk.
A generalized composition algorithm for weighted
finite-state transducers.
In Interspeech 2009, pages 1203-1206. ISCA, 2009.
(PDF, 182281 bytes)
- [10]
- Cyril Allauzen and Mehryar Mohri.
N-way composition of weighted finite-state
transducers.
International Journal of Foundations of Computer Science,
20(4):613-627, 2009.
(PDF, 244348 bytes)
- [11]
- Cyril Allauzen, Mehryar Mohri, and Ashish
Rastogi.
General algorithms for
testing the ambiguity of finite automata.
In Developments in Language Theory, 12th International Conference, DLT
2008, volume 5257 of Lecture Notes in Computer Science,
pages 108-120. Springer, 2008.
- [12]
- Cyril Allauzen, Mehryar Mohri, and Ameet
Talwalkar.
Sequence kernels for
predicting protein essentiality.
In Machine Learning, Proceedings of the Twenty-Fifth International
Conference (ICML 2008), volume 307 of ACM International
Conference Proceeding Series, pages 9-16. ACM, 2008.
- [13]
- Cyril Allauzen and Mehryar Mohri.
3-way composition of
weighted finite-state transducers.
In Implementation and Applications of Automata, 13th International
Conference, CIAA 2008, volume 5148 of Lecture Notes in Computer
Science, pages 262-273. Springer, 2008.
- [14]
- Cyril Allauzen and Mehryar Mohri.
N-way composition of weighted finite-state
transducers.
Technical Report TR2007-902, Department of Computer Science, Courant Institute
of Mathematical Sciences, New York University, August 2007.
(PostScript, 12 pages, 495845 bytes)
(PDF, 170317 bytes)
- [15]
- Cyril Allauzen, Michael Riley, Johan
Schalkwyk, Wojciech Skut, and Mehryar Mohri.
OpenFst: a general and efficient weighted
finite-state transducer library.
In Proceedings of the 12th International Conference on Implementation and
Application of Automata, (CIAA 2007), volume 4783 of Lecture
Notes in Computer Science, pages 11-23. Springer, 2007.
(PDF, 439327 bytes)
- [16]
- Cyril Allauzen and Mehryar Mohri.
A unified construction of the Glushkov, follow, and
Antimirov automata.
In Proceedings of the 31st International Symposium on Mathematical
Foundations of Computer Science (MFCS 2006), volume 4162 of
Lecture Notes in Computer Science, pages 110-121. Springer,
2006.
(PostScript, 12 pages, 421075 bytes)
(PDF, 183721 bytes)
- [17]
- Cyril Allauzen and Mehryar Mohri.
A unified construction of the Glushkov, follow and Antimirov automata.
Technical Report TR2006-880, Department of Computer Science, Courant Institute
of Mathematical Sciences, New York University, April 2006.
(PostScript)
(PDF)
- [18]
- Cyril Allauzen and Mehryar Mohri.
The design principles and algorithms of a weighted
grammar library.
International Journal of Foundations of Computer Science,
16(3):403-421, 2005.
(PostScript, 19 pages, 461500 bytes)
(PDF, 207665 bytes)
- [19]
- Sarangarajan Parthasarathy, Cyril Allauzen,
and Rungsun Munkong.
Robust access to large structured data using voice form-filling.
In Proceedings of Eurospeech-Interspeech 2005, pages 2493-2496,
2005.
- [20]
- Vincent Goffin, Cyril Allauzen, Enrico
Bocchieri, Dilek Hakkani-Tür, Andrej Ljolje, Sarangarajan Parthasarathy,
Mazim Rahim, Giuseppe Riccardi, and Murat Saraclar.
The AT&T Watson speech recognizer.
In Proceedings of the 2005 IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP'2005), volume 1, pages
1033-1036, 2005.
(PDF, 154911 bytes)
- [21]
- Cyril Allauzen, Mehryar Mohri, and Michael
Riley.
Statistical modeling for unit selection in speech
synthesis.
In Proceedings of the 42nd Annual Meeting of the Association for
Computational Linguistics (ACL'2004), pages 55-62, 2004.
(PostScript, 8 pages, 261280 bytes)
(PDF, 108214 bytes)
- [22]
- Cyril Allauzen, Mehryar Mohri, and Brian
Roark.
A general weighted grammar library.
In Proceedings of the Ninth International Conference on Implementation
and Application of Automata (CIAA'2004), volume 3317 of Lecture
Notes in Computer Science, pages 23-34. Springer, 2005.
(PostScript, 11 pages, 363398 bytes)
(PDF, 138900 bytes)
- [23]
- Cyril Allauzen and Mehryar Mohri.
An optimal pre-determinization algorithm for weighted
transducers.
Theoretical Computer Science, 328(1-2):3-18, 2004.
(PostScript, 18 pages, 521242 bytes)
(PDF, 201575 bytes)
- [24]
- Cyril Allauzen, Mehryar Mohri, Michael Riley,
and Brian Roark.
A generalized construction of integrated speech
recognition transducers.
In Proceedings of the 2004 IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP'2004), volume I, pages 761-764,
2004.
(PostScript, 4 pages, 254138 bytes)
(PDF, 84958 bytes)
- [25]
- Cyril Allauzen, Mehryar Mohri, and Murat
Saraclar.
General indexation of weighted automata -- application
to spoken utterance retrieval.
In Proceedings of the Workshop on Interdisciplinary Approaches to Speech
Indexing and Retrieval at HLT/NAACL 2004, pages 33-40, 2004.
(PostScript, 8 pages, 277832 bytes)
(PDF, 114898 bytes)
- [26]
- Cyril Allauzen and Mehryar Mohri.
Finitely subsequential transducers.
International Journal of Foundations of Computer Science,
14(6):983-994, 2003.
(PostScript, 12 pages, 244150 bytes)
(PDF, 228635 bytes)
- [27]
- Cyril Allauzen and Mehryar Mohri.
An efficient pre-determinization algorithm.
In Proceedings of the Eighth International Conference on Implementation
and Application of Automata (CIAA'2003), volume 2759 of Lecture
Notes in Computer Science, pages 83-95. Springer, 2003.
(PostScript, 12 pages, 296741 bytes)
(PDF, 214919 bytes)
- [28]
- Cyril Allauzen, Mehryar Mohri, and Brian Roark.
Generalized algorithms for constructing statistical
language models.
In Proceedings of the 41st Annual Meeting of the Association for
Computational Linguistics (ACL'2003), pages 40-47, 2003.
(PostScript, 8 pages, 230300 bytes)
(PDF, 161534 bytes)
- [29]
- Cyril Allauzen and Mehryar Mohri.
Efficient algorithms for testing the twins
property.
Journal of Automata, Languages and Combinatorics, 8(2):117-144,
2003.
(PostScript, 29 pages, 579078 bytes)
(PDF, 355609 bytes)
- [30]
- Cyril Allauzen and Mehryar Mohri.
Generalized optimization algorithm for speech recognition
transducers.
In Proceedings of the 2003 IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP'2003), volume I, pages 352-355.
IEEE, 2003.
(PostScript, 4 pages, 194582 bytes)
(PDF, 99019 bytes)
- [31]
- Cyril Allauzen and Mehryar Mohri.
p-Subsequentiable transducers.
In Proceedings of the Seventh International Conference on Implementation
and Application of Automata (CIAA'2002), volume 2608 of Lecture
Notes in Computer Science, pages 24-34. Springer, 2003.
(PostScript, 12 pages, 250466 bytes)
(PDF, 212437 bytes)
- [32]
- Cyril Allauzen, Maxime Crochemore, and Mathieu
Raffinot.
Efficient
experimental string matching by weak factor recognition.
In Proceedings of the 12th Annual Symposium on Combinatorial Pattern
Matching (CPM'2001), volume 2089 of Lecture Notes in Computer
Science, pages 51-72. Springer, 2001.
- [33]
- Cyril Allauzen.
Combinatoire sur les mots et recherche de motifs.
PhD thesis, Université de Marne-la-Vallée, 2001.
- [34]
- Cyril Allauzen and Mathieu Raffinot.
Simple optimal string
matching.
J. Algorithms, 36(1):102-116, 2000.
- [35]
- Cyril Allauzen and Mathieu Raffinot.
Simple optimal string matching (extended abstract).
In Proceedings of the 11th Annual Symposium on Combinatorial Pattern
Matching (CPM'2000), volume 1848 of Lecture Notes in Computer
Science, pages 364-374. Springer, 2000.
- [36]
- Cyril Allauzen.
Calcul
efficace du shuffle de k mots.
Technical report 2000-02, Institut Gaspard-Monge, Université de
Marne-la-Vallée, 2000.
- [37]
- Cyril Allauzen and Mathieu Raffinot.
Oracle des
facteurs d'un ensemble de mots.
Technical report 99-11, Institut Gaspard-Monge, Université de
Marne-la-Vallée, 1999.
- [38]
- Cyril Allauzen, Maxime Crochemore, and Mathieu
Raffinot.
Factor oracle
: a new structure for pattern matching.
In Proceedings of SOFSEM'99, volume 1725 of Lecture Notes in
Computer Science, pages 295-310. Springer, 1999.
- [39]
- Cyril Allauzen.
Une
caractérisation simple des nombres de Sturm.
Journal de Théorie des Nombres de Bordeaux, 10(2):237-241,
1998.
- [40]
- Cyril Allauzen and Bruno Durand.
Tillings
problems.
In E. Börger, E. Grädel, and Y. Gurevich, editors, The classical decision
problem. Springer, 1997.
- [41]
- Cyril Allauzen and Bruno Durand.
Pavages du
plan: périodicité et décidabilité.
Technical report 95-28, Laboratoire de l'Informatique du Parallélisme,
École Normale Supérieure de Lyon, 1995.
Co-authors