Naomi Sager

Research Professor, Computer Science Department


Department of Computer Science
Courant Institute of Mathematical Sciences
New York University
Courant Institute of Mathematical Sciences • 251 Mercer Street • New York, NY 10012 •
• +1 212.998-3097 (voice) • +1 212.995-4123 (fax) • sager@cs.nyu.edu

Topics


 

Naomi Sager was born in Chicago, IL in 1927. From 1942-1946 she attended the newly established Four Year College at the University of Chicago receiving the degree Bachelor of Philosophy in 1946. In 1953 Sager obtained a B.S. in Electrical Engineering from Columbia University and from 1953-1958 worked as an electronics engineer in the Biophysics Department of the Sloan-Kettering Institute for Cancer Research in New York City. One project there was to develop an instrument for continuous blood pressure measurement and control for patients in hemorrhagic shock ["Servomechanism for the Regulation of Blood Pressure," Naomi Sager, John H. Waite, J. William Poppell, and William S. Howland, Review of Scientific Instruments, 28 (1957).

Sager began work on the computer processing of language as a member of the team that developed the first English language parsing program that ran on Univac 1 at the University of Pennsylvania in 1959 [Transformations and Discourse Analysis Papers 15-19, Dept. of Linguistics, U. of Pa. 1959, and Z.S. Harris, String Analysis of Sentence Structure, Mouton & Co. The Hague, 1962]. Sager's section of the program was to treat syntactic ambiguity (more than one possible analysis at points in the sentence), cf. TDAP 17 on the list of TDAP publications. The structures imposed by the 1959 program proved unwieldy for this task and in 1960 Sager developed an algorithm and a form of string grammar in which the treatment of ambiguity was an integral part (A Procedure for Left-to-Right Analysis of Sentence Structure [TDAP 27]). This work became the basis of a Ph.D. thesis for which she was awarded a Ph.D. in Linguistics from the University of Pennsylvania in 1968 and served as the basis for the founding of the Linguistic String Project at New York University in 1965.

The computer string grammar of English at the core of the LSP parsing program was published in 1981. N. Sager, Natural Language Information Processing: A Computer Grammar of English and Its Applications (Addison-Wesley, Reading, MA) [LSP34]. The grammar was modified to handle the specialized language of clinical documents, and the English lexicon used by the parser was augmented with medical semantic attributes, Medical Language Processing: Computer Management of Narrative Data, with Friedman, C., Lyman, M.S., MD and members of the LSP, 1987 (Addison-Wesley, Reading, MA) [LSP65]. The resulting Medical Language Processor (MLP) is documented at the LSP website.

Sager taught courses in Natural Language Processing and maintained the Linguistic String Project (LSP) at New York University from 1965 until her retirement in 1995. She resides in New York and for part of each year in Paris. She was one of the translators from French to English of the autobiography of Ngo Van, a Vietamese revolutionary who, while working in a factory in Paris, became an engineer, a published scholar and author of numerous works [Ngo Van, In the Crossfire: Adventures of a Vietnamese Revolutionary, AK Press, Oakland CA, 2010].

discarded

In a 1967 article [MLP 1] Sager laid out the basis for language computation and described the first two implementations of the LSP parser and string grammar. The grammar was specified in two components: a set of formal rewriting rules written in Backus Normal Form (BNF) that provided the structure of the output parse tree, and a set of procedures, called restrictions, that operated on the parse tree to enforce detailed grammatical constraints [MLP 5]. An English-like programming language for expressing restrictions, the Restriction Language RL, was developed [MLP 12].

Applications of the LSP system drew upon Sublanguage Grammar, an extension of linguistic methods whereby the constraints on word combinations special to a subject matter are formalized into quasi-grammatical rules [MLP 11]. It was further shown that parsed documents could be mapped into sublanguage labeled structures, called information formats, on which information retrieval procedures could operate [MLP 28]. The MLP concentrated on the sublanguage of clinical reporting, X-ray reports, hospital discharge summaries, and the like, demonstrating an automated application of health care criteria to information formatted narrative medical reports [MLP 30]. The fully implemented form of the MLP string grammar and some of the initial applications were published in Sager's 1981 book [MLP 34]. The collective work of the LSP team on medical records was summarized in a 1987 volume [MLP 65]. A general overview of methods and results was presented by Sager at the New York Academy of Sciences in 1990 [MLP 78]. The ways in which contributions to Linguistics by Zellig Harris were utilized in the development of the LSP system were described in the symposium dedicated to his work [MLP 91].

The LSP medical language processor was converted to French in a collaborative research with the Informatics group of the Cantonal Hospital of Geneva, Switzerland, under the direction of Jean-Raoul Scherrer [MLP 77][MLP 76]. Subsequently, an XML hierarchy of medical knowledge tags was added to the system along with an online viewer by which clinicians could see highlighted portions of documents pertaining to particular patient problems or therapies, demonstrated as part of Sager's keynote address at the Second International Conference on the Clinical Document Architecture, October 20-22, 2004 at Acapulco, Mexico.

 

Linguistic String Project

The Linguistic String Project (LSP) at New York University was one of the earliest research and development projects in computer processing of natural (i.e. human) language. It was initiated by Naomi Sager at NYU in 1965 with a grant from the Office of Science Information Services of the National Science Foundation (OSIS). The OSIS at that time was seeking means to provide scientists rapid access to information in the expanding technical literature. Computer analysis of language that would facilitate pin-pointed search and retrieval of requested information was one avenue they were pursuing.

The LSP approach was to begin with a parsing program to obtain the syntactic relations among sentence words, the basic structure of language-borne information. This entailed the implementation of a parsing algorithm (top-down, left-to-right with calls on linguistic test procedures) as first described by Sager in 1960, "A Procedure for Left to Right Analysis of Sentence Structure," Report 27 of the series Transformations and Discourse Analysis Papers (TDAP) published by the Dept. of Linguistics, University of Pennsylvania.

A 1967 article "Syntactic Analysis of Natural Language" (Advances in Computer 8:153-188, Academic Press, NY) [LSP 1] laid out the basis for language computation and described the first two implementations of the LSP parser and string grammar. The grammar was specified in two components: a set of formal rewriting rules written in Backus Normal Form (BNF) that provided the structure of the output parse tree, and a set of procedures, called restrictions, that operated on the parse tree to enforce detailed grammatical constraints, see "A Two-Stage BNF Specification of Natural Language," Journal of Cybernetics 2-3 (1972): 39-50 [LSP 5]. An English-like programming language for expressing restrictions, the Restriction Language RL, was developed, see "The Restriction Language for Computer Grammars of Natural Language," with Grishman, R., Communications of the ACM 18:390-400 [LSP 12]. The computer grammar of English that formed an integral part of the system was published in 1981, see Natural Language Information Processing: A Computer Grammar of English and Its Applications, Addison-Wesley, Reading, MA [LSP 34].

Applications of the LSP system drew upon Sublanguage Grammar, an extension of linguistic methods whereby the constraints on word combinations special to a subject matter are formalized into quasi-grammatical rules, see "Sublanguage grammars in science information processing, Journal of the American Society for Information Science 26(1975): 10-16 [LSP 11]. It was further shown that parsed documents could be mapped into sublanguage labeled structures, called information formats, on which information retrieval procedures could operate, see "Natural languagee information formatting: the automatic conversion of texts to a structured data base," in Advances in Computers 17 (M.C. Yovits, ed.) 89-162 (1978), Academic Press, NY [LSP 28]. The LSP concentrated on the sublanguage of clinical reporting, X-ray reports, hospital discharge summaries, and the like, demonstrating an automated application of health care criteria to information formatted narrative medical reports, see Hirschman, L. et al. "Automatic application of health care criteria to narrative patient records," Proceedings of the Third Annual Symposium on Computer Applications in Medical Care (R.A. Dunn, ed.), 105-113, IEEE, NY [LSP 30]. The collective work of the LSP team on medical records was summarized in a 1987 volume, cf. Medical Language Processing: Computer management of narrative data, Sager, N, Friedman, C., Lyman, M.S., MD, and LSP members, Addison-Wesley, MA [LSP 65]. The LSP Medical Language Processor, including the medically specialized English grammar and dictionary is available on the Linguistic String Project website.

The LSP MLP was converted to French in a collaborative research with the Informatics group of the Cantonal Hospital of Geneva, Switzerland, see Nhàn, N.T., et al. "A medical language proccessor for two Indo-European languages," Proceedings of the 13th Annual Symposium on Computer Application in Medical Care (SCAMC13), L.C. Kingsland, ed., IEEE Computer Society Press, Washington D.C., 554-558 [LSP 77], and Borst, F. et al. "Analyse automatique de comptes rendues d'hospitalisation," Informatique et Santé, Informatique et Gestion des Unités de Soins, Comptes Redus du Colloque AIM-IF, Paris, 1989, Degoulet, P., et al., redacteurs, Paris, Springer-Verlag, 246-256 [LSP 76]. Subsequently, an XML hierarchy of medical knowledge tags was added to the system along with an online viewer by which clinicians could see highlighted portions of documents pertaining to particular patient problems or therapies, demonstrated as part of Sager's keynote address at the Second International Conference on the Clinical Document Architecture, October 20-22, 2004 at Acapulco, Mexico.

A general overview of methods and results of the LSP was presented by Sager at the New York Academy of Sciences in 1990, see "Computer analysis of sublanguage information structures," Annals of NY Academy of Sciences, 683: 161-179 [LSP 78]. The ways in which contributions to Linguistics by Zellig Harris were utilized in the development of the LSP system were described by Sager and Ngô Thanh Nhàn in the symposium dedicated to Harris's work, see "The computability of strings, transformations, and sublanguage," in The Legacy of Zellig Harris, eds. by Nevin, B. et al., John Benjamins Publishing Co., Amsterdam, Vol. 2, Chapter 4, 79-120 [LSP 91].

discarded

PARSING

What could be easier than diagramming a sentence? Kids do it. But do they use a rigorous procedure, an algorithm, without intuition or built in experience of language, as a computer must do? Probably not. Yet without the syntactic relations among sentence words there would be no information. The first task in what came to be called "natural language processing" was thus seen to be computer parsing. In addition to a parsing program and a computer grammar of English to drive it, a "dictionary" was needed that would provide each word with its parts of speech and grammatical attributes. Procedures to test the parse tree and the attributes of the words attached to it were also needed. These development tasks occupied the LSP group into the early 1970's.

 
MEDICAL LANGUAGE PROCESSING

Moving toward applications it became clear that the general English grammar used by the LSP parser would have to be specialized for use in a particular subject area in order to arrive at a retrieval-ready representation of textual content. The theoretical basis for this specialization was found in Sublanguage Grammar, an extension of linguistic methods whereby the constraints on word combinations special to a subject matter are formalized into quasi-grammatical rules. The LSP focused on the sublanguage of clinical reporting as seen in hospital discharge summaries, clinical notes, admission histories, and the like. The medical sublanguage additions to the LSP system were largely funded by the National Library of Medicine of the National Institutes of Health.

When fully implemented the LSP Medical Language Processor comprised five stages of processing followed by a mapping of the output to a database for querying operations. Parsing (stage 1) was followed by Selection (stage 2) that filtered parses for medically compatible word choices within structures. Transformations (stage 3) effected structural rearrangements using established linguistic transformations. Regularization (stage 4) cast connective structure into Polish notation. Information formatting (stage 5) mapped the results of processing into a medically labeled structure called an "information format". Applications were demonstrated primarily in the area of Medical Quality Assurance.

The LSP-MLP was converted to French in a collaborative research with the Informatics group of the Cantonal Hospital of Geneva, Switzerland during 1988-1992. The system was also converted at the level of Ph.D. theses to German and to Dutch. When funding ended, a number of LSP "alumni" found posts in hospitals that were adding medical language processing to their information systems or in artificial intelligence research.

advances_in_computers_17_1978.pdf
Dutch_MLP.pdf
jamia96.pdf
lsp_lexical_subclasses_1981.pdf
LSPonSNOMED.pdf
NLP-RCD.pdf
scamc94.pdf
Sublanguage.pdf
Wolff_1984.pdf