G22.2590 - Natural Language Processing - Spring 2006 Prof. Grishman
Lecture 12 Outline
April 18, 2006
Discourse. Until now we considered the structure and meaning
of sentences in isolation. We now turn to issues primarily connected
with multi-sentence text -- discourse.
Reference Resolution (J&M 18.1)
- referent: real-world object being referred to
- referring expression: a portion of text referring
to that object (we shall also refer to these as mentions of the object)
- discourse entities: the set of objects referred
to by a text
- coreference: two expressions -- antecedent
and anaphor -- referring to the same thing
- the first mention of an object in a discourse evokes this
Types of referring expressions
- definite pronouns (he, she, it, ...): generally anaphoric
- but 'it' has non-referring usages: "It is raining." "It is
unlikely that he will come."
- indefinite pronouns (one): can be modified ('the green one')
- definite NPs (the car): generally anaphoric
- indefinite NPs (a car): generally evoke a new discourse entity
- may also be generic: "Giraffes are beautiful creatures."
- names: named entities can be later referred to by portions of
- inferrables: sometimes the relation between anaphor and antecedent
is not one of identity ...
"I entered the room and looked at the ceiling."
- zero anaphora: sometimes the anaphor is implicit
- many languages allow subject omission, and some allow omission of other
arguments (e.g., Japanese)
- some cases of inferrable anaphora can be described in terms of PPs with
"IBM announced the appointment of Fred as president [of IBM]."
- expressions can also refer to events, propositions, ...
- "Fred claimed that no one programs in Lisp. That is
Resolving pronoun reference
- constraints: number and gender agreement
- preferences: recency, grammatical role (reference to subject
- implementation: associate score with preferences; select
antecedent of highest score satisfying constraints
(can incorporate preferences into search order -- Hobbs' search order)
- selectional constraints (can be learned from a large corpus)
- accuracy fairly good -- somewhere in 80's%
Resolving other referring expressions
- names: generally quite straightforward -- look for prior name
of which this is a substring
- common noun phrases: generally quite hard
- deciding if an NP is anaphoric
- deciding if an NP description is consistent with an antecedent
(for example, we may use different nouns to describe the same
entity -- "the soldier", "the Marine", etc.)
Anaphora resolution in Jet
- assumes the only referring expressions are noun groups
- generates entity annotations corresponding to discourse entities
- anaphora resolution consists of linking each noun group to a new
or existing entity
- simple rule for common noun phrases -- only checks for matching head
- performed by resolve operation (typically done after all
- results are displayed in a separate entity window (which appears,
along with the regular document viewer, if entities have been generated)
Using anaphora resolution for extraction: an example
In many cases, we want to be able to retrieve an argument from context when
it is not part of the immediate syntactic structure. A simple way of
doing this is to generate a zero anaphor (an ngroup constituent not spanning
any text) and then let reference resolution map it to an entity. We
have created a version of the
AppointPatterns which uses this method to collect
organization names and, in some cases, people names.
Discourse Analysis: Analyzing Text Coherence (J&M 18.2)
Why are we interested in analyzing the structure of a discourse beyond
the sentence level?
How to analyze text coherence?
- resolve ambiguities from earlier stages of processing (syntactic,
semantic, anaphoric analysis)
- establish intersential connections:
- contextual (inferrable) reference
- Most of these connections are implicit, but they are an important
part of the information which the hearer/reader is expected to glean from
the discourse. A discourse lacking such connections is "incoherent".
- I walked from the kitchen into the living room and looked at the
ceiling. [inferrable reference]
- Just before dawn, the Valian sighted the Zwiebel and fired two torpedos.
It sank swiftly, leaving few survivors.
- Jack poisoned Sam. He died within a week. vs.
- Jack poisoned Sam. He was arrested within a week.
- First we need to define the criteria of text coherence
- different types of text have different goals and organizing principles,
so across all text types it is possible to define only very general criteria
- more specific criteria can be defined for particular genres:
exposition, narrative, ...
- most work has been done on analysis of narrative
- In analyzing the connection between two sentences S1 and S2, we use
'world knowledge' to generate expectations arising out of S1, and try to
match these against S2
- if there are multiple interpretations of S2, we may succeed in
matching only some of them against the expectations, thus disambiguating
S2 (e.g., the anaphora examples just above)
- More generally, we can apply abductive methods: given the prior
discourse and world knowledge, what minimal set of additional assumptions
need one make to infer the current sentence? (J&M 697-704)