Naomi Sager

Research Professor, Computer Science Department

Department of Computer Science
Courant Institute of Mathematical Sciences
New York University

Courant Institute of Mathematical Sciences • 251 Mercer Street • New York, NY 10012 •
• +1 212.998-3097 (voice) • +1 212.995-4123 (fax) • sager@cs.nyu.edu •

Topics

Transformations and Discourse Analysis Project (TDAP), University of Pennsylvania
Linguistic String Project (LSP)
LSP Publications (annotated)
String Program Reports (LSP SPR)

Naomi Sager

Naomi Sager was born in Chicago, IL in 1927. From 1942-1946 she attended the newly established Four Year College at the University of Chicago receiving the degree Bachelor of Philosophy in 1946. In 1953 Sager obtained a B.S. in Electrical Engineering from Columbia University and from 1953-1958 worked as an electronics engineer in the Biophysics Department of the Sloan-Kettering Institute for Cancer Research in New York City. One project there was to develop an instrument for continuous blood pressure measurement and control for patients in hemorrhagic shock.¹

Sager began work on the computer processing of language as a member of the team that developed the first English language parsing program that ran on Univac 1 at the University of Pennsylvania in 1959.² Sager's section of the program was to treat syntactic ambiguity (more than one possible analysis at points in the sentence).³ The structures imposed by the 1959 program proved unwieldy for this task and in 1960 Sager developed an algorithm and a form of string grammar in which the treatment of ambiguity was an integral part (cf. A Procedure for Left-to-Right Analysis of Sentence Structure [TDAP 27]). This work became the basis of a Ph.D. thesis for which she was awarded a Ph.D. in Linguistics from the University of Pennsylvania in 1968 and served as the basis for the parsing program first developed at New York University in a collaboration with James Morris and Morris Salkoff (SPR 1 & SPR 5).⁴ In the early 1960's, in addition to Sager's work in natural language processing (NLP) at NYU, Susumu Kuno at Harvard University applied his Predictive Analyzer to English syntax.⁵ During this period many projects in machine translation were generously funded by the U.S. government until the Automatic Language Processing Advisory Committee (ALPAC) report (1966) found too little progress had been made to justify further support. Research in English computer parsing however continued at NYU and was joined later by IBM and other groups.

The Linguistic String Project (LSP) at New York University began in 1965 with funding from the National Science Foundation to develop computer methods for structuring and accessing information in the scientific and technical literature. Document processing was to be based on linguistic principles, first to demonstrate the possibility of computerized grammatical analysis (parsing), then to include the specialized vocabulary and rules for particular scientific domains. Domain specialization led to an elaboration of the methods of sublanguage analysis,⁶ in particular, as applied to the language of clinical reporting in patient documents. The 30+ year history of the Linguistic String Project and its results are tracked in the LSP annotated bibliography.

From 1966 to 1984, the NYU Linguistic String Project (LSP) issued a series of volumes, String Program Reports (SPR) 1-16 documenting in detail the development of the first LSP parser and string grammar, their further development and the researches into the structure of information they facilitated.

The computer string grammar of English at the core of the LSP parsing program was published in 1981.⁷ The grammar was modified to handle the specialized language of clinical documents, and the English lexicon used by the parser was augmented with medical semantic attributes. Post-parsing procedures carried the input documents to a database representation of their textual content suitable for highly specific information querying.⁸ The resulting Medical Language Processor (MLP) is documented at the LSP website.

Sager taught courses in Natural Language Processing and maintained the Linguistic String Project (LSP) at New York University from 1965 until her retirement in 1995. She resides in New York and for part of each year in Paris. She was one of the translators from French to English of the autobiography of Ngô Văn, a Vietamese revolutionary who, while working in a factory in Paris, became an engineer, a published scholar and author of numerous works.⁹

Footnotes:

"Servomechanism for the Regulation of Blood Pressure," Naomi Sager, John H. Waite, J. William Poppell, and William S. Howland, Review of Scientific Instruments, 28 (1957).
Transformations and Discourse Analysis Papers 15-19, Dept. of Linguistics, U. of Pa. 1959, and Z.S. Harris, String Analysis of Sentence Structure, Mouton & Co. The Hague, 1962.
TDAP 17 on the list of TDAP publications.
Sager, N., Salkoff, M., Morris, J. and Raze (Friedman), C. (1966). Report on the String Analysis Programs, Introductory Volume. String Program Reports (SPR) No. 1. Linguistic String Project, New York University & University of Pennsylvania.
— Salkoff, M. and Sager, N. (1969) Grammatical Restrictions on the IPLV and FAP String Programs. S.P.R. No. 5. NYU Linguistic String Project.
Kuno, Susumu (1966). The Augmented Predictive Analyzer for for context-free languages—its relative efficiency. Commun ACM 11(9): 613-618.
N. Sager, Natural Language Information Processing: A Computer Grammar of English and Its Applications (Addison-Wesley, Reading, MA) [LSP34].
Harris, Zellig. 1968, Sublanguages, in Mathematical Structure of Language, Section 5.9, pp. 152-156. Interscience Publishers, John Wiley & Sons.
— Sager, N. Syntactic formatting of science information, in Sublanguage: Studies of Language in Restricted Semantic Domains, eds. R. Kittredge & J. Lehrberger. Walter de Gruyter. Berlin, New York. 1982.
Medical Language Processing: Computer Management of Narrative Data, with Friedman, C., Lyman, M.S., MD and members of the LSP, 1987 (Addison-Wesley, Reading, MA) [LSP65].
Ngo Van, In the Crossfire: Adventures of a Vietnamese Revolutionary, AK Press, Oakland CA, 2010.

Linguistic String Project

The Linguistic String Project (LSP) was a sustained research effort (1960-2005) in the computer processing of language based on the linguistic theory of Zellig Harris: linguistic string theory, transformational analysis and sublanguage grammar. The programs, developed by the Project adapted for clinical narrative in LSP Medical Language Processor (LSP-MLP) that supported online access by clinicians to portions of narrative patient documents relevant to stated concerns.

The Linguistic String Project (LSP) at New York University was one of the earliest research and development projects in computer processing of natural (i.e. human) language. It was initiated by Naomi Sager at NYU in 1965 with a grant from the Office of Science Information Services [OSIS] of the National Science Foundation. The OSIS at that time was seeking means to provide scientists rapid access to information in the expanding technical literature. Computer analysis of language that would facilitate pin-pointed search and retrieval of requested information was one avenue they were pursuing.

The LSP approach was to begin with a parsing program to obtain the syntactic relations among sentence words, the basic structure of language-borne information. This entailed the implementation of a parsing algorithm (top-down, left-to-right with calls on linguistic test procedures) as first described by Sager in 1960 ["A Procedure for Left to Right Analysis of Sentence Structure", Report 27 of the series Transformations and Discourse Analysis Papers published by the Dept. of Linguistics, U. of Pa.]

A 1967 article [LSP 1] laid out the basis for language computation and described the first two implementations of the LSP parser and string grammar. The grammar was specified in two components: a set of formal rewriting rules written in Backus Normal Form (BNF) that provided the structure of the output parse tree, and a set of procedures, called restrictions, that operated on the parse tree to enforce detailed grammatical constraints [LSP 5]. An English-like programming language for expressing restrictions, the Restriction Language RL, was developed [LSP 12]. The computer grammar of English that formed an integral part of the system was published in 1981 [LSP 34¹].

Applications of the LSP system drew upon Sublanguage Grammar, an extension of linguistic methods whereby the constraints on word combinations special to a subject matter are formalized into quasi-grammatical rules [LSP 11²]. It was further shown that parsed documents could be mapped into sublanguage labeled structures, called information formats, on which information retrieval procedures could operate [LSP 28]. The LSP concentrated on the sublanguage of clinical reporting, X-ray reports, hospital discharge summaries, and the like, demonstrating an automated application of health care criteria to information formatted narrative medical reports [LSP 30³]. The collective work of the LSP team on medical records was summarized in a 1987 volume [LSP 65⁴]. The LSP Medical Language Processor, including the medically specialized English grammar and dictionary is available on the Linguistic String Project website.

The LSP MLP was converted to French in a collaborative research with the Informatics group of the Cantonal Hospital of Geneva, Switzerland [LSP 77][LSP 76⁵]. Subsequently, an XML hierarchy of medical knowledge tags was added to the system along with an online viewer by which clinicians could see highlighted portions of documents pertaining to particular patient problems or therapies, demonstrated by Ngô Thanh Nhàn, its principal designer, as part of Sager's keynote address at the Second International Conference on the Clinical Document Architecture, October 20-22, 2004 at Acapulco, Mexico.

A general overview of methods and results of the LSP was presented by Sager at the New York Academy of Sciences in 1990 [LSP 78]. The ways in which contributions to Linguistics by Zellig Harris were utilized in the development of the LSP system were described by Sager and Ngô Thanh Nhàn in the symposium dedicated to Harris's work [LSP 91].

Notes:

Sager, N. 1981. Natural language information processing: a computer grammar of English and its applications. Addison-Wesley, Reading, Mass.
Wiley Online Library: JASIS onlinelibrary.wile.com/doi/10.1002/asi.4630260104/pdf.
Sager, N., Friedman, C., Lyman, M.S., MD, and members of the Linguistic String Project (1987). Medical Language Processing: Computer Management of Narrative Data. Addison-Wesley, Reading, MA.
Borst, F., Sager, N., Nhàn, N.T., Su, Y., Lyman, M., Tick, L.J., Revillard, C., Chi, E., et Scherrer, J.R. (l989) Analyse Automatique de Comptes Rendus D'Hospitalisation. In Informatique et Santé, Informatique et Gestion des Unités de Soins, Comptes Rendus due Colloque AIM-IF, Paris, 1989, Degoulet, P., Stephan, J-C., Venot, A., et Yvon, P-J., Redacteurs. Paris, Springer-Verlag, pp. 246-256.
Sager, N., Friedman, C., Lyman, M.S., MD, and members of the Linguistic String Project (1987). Medical Language Processing: Computer Management of Narrative Data. Addison-Wesley, Reading, MA.

Informatique et Santé, Informatique et Gestion des Unités de Soins