[Next] [Previous] [Top] [Back to MUC-6 main page]

Information Extraction Task Definition

1 Overview of Information Extraction Task

1.1 - Scenario-Dependent and Scenario-Independent Subtasks
1.2 - Evaluation Stages
1.3 - Levels of Template Structure

1.1 Scenario-Dependent and Scenario-Independent Subtasks

The overall goal of the Information Extraction (IE) task is to provide an evaluation of IE technology with reduced overhead and reduced non-NLP requirements, as compared to recent MUCs. To enforce the requirements for reduced overhead, the participant preparation for the evaluation will consist of two stages. The first stage will be scenario-independent, and will begin well in advance of the evaluation; the second stage of participant preparation, which is scenario-dependent, will start just one month prior to the evaluation.

As currently envisioned, the task to be performed during test week will consist of two subtasks:

SUBTASK 1.

Scenario Template evaluation, the so-called "Mini-MUC," is the traditional template-level subtask, where the participants are evaluated on whether the templates contain exactly the instantiated objects and filled slots as specified in the scenario definition (and reflected in the answer key), with penalties for spurious, missing, and wrong objects and slot fills.

SUBTASK 2.

Template Element evaluation, the so-called "predefined objects" evaluation. There are three types of Template Element objects, ORGANIZATION, PERSON, and ARTIFACT. One difficulty with the Scenario Template subtask is that it is subject to the "lynchpin" or "keystone" effect, where a decision whether to instantiate an object carries a high penalty if wrong (points off for each slot fill in that object under the All Objects scoring method). We can reduce the lynchpin effect by having a subtask which does not involve scenario-dependent relevance criteria. Furthermore, this subtask is viewed as an interesting exercise in its own right, as the next step up from the aggregation of the Named Entity task and the Coreference task.

For example, for the Template Element evaluation, an ORGANIZATION object and all possible slots defined for that object type are to be instantiated for each organization mentioned in a given text, even if a given Scenario Template task confines itself to organizations which are airplane manufacturers and requires an organization's nationality but not its location. For the Scenario Template evaluation, only those Template Element object and slot types that appear in the scenario task definition will be tested. ARTIFACT objects are handled somewhat differently from ORGANIZATION and PERSON objects. For ARTIFACT objects, the Scenario Template will define the particular kind of artifact to be reported and the particular slots to be used from the Template Element BNF for ARTIFACT. The Template Element test for ARTIFACT will be limited to that type of artifact and to the scenario-defined ARTIFACT slots.

1.2 Evaluation Stages

STAGE 1 (with the announcement of the evaluation).

The participants are given the definitions for the scenario-independent and scenario-neutral template elements (defined in this document). The definitions in this document do not reflect the requirements of any particular scenario. The participants are also given one or more example IE scenario definitions and data sets, similar in nature (but not in content) to the Scenario Template task(s) to be used for the actual evaluation. During stage 1, it is expected that the participants will develop their systems to perform on the Template Element evaluation subtask (especially ORGANIZATION and PERSON objects) and will design their system to be able to accomodate the template design requirements of Scenario Template task definitions to be released during stage 2 of the evaluation.

STAGE 2 (one month prior to test week).

The participants are given one or two scenario definitions. During the course of this one-month period, the participants configure their system to produce the appropriate subset of the Template Elements and to produce the higher-level object(s) as defined in the scenario statement. The entire template for any given task is therefore fairly simple, consisting of one or more Template Element objects, only one scenario-specific (high-level) object, and perhaps a relational object. The number of slots (other than pointer slots) that do not come from the set of Template Elements will be five or less.

1.3 Levels of Template Structure

Four levels of template objects are defined:

LEVEL 1 (Template Element).

The objects and slots defined in this document. These are generic Template Elements which may play a role in virtually any task scenario. These template elements are not oriented towards any particular task, but instead attempt to capture the sort of information that may be needed for a wide range of tasks. All of these objects are fairly simple and have no relational information (i.e., no pointers to other objects). For a given IE scenario, only a subset of the predefined Template Element objects will be used; in addition, one or more slots might be ignored from the Template Element objects that are used.

LEVEL 2 (Relational Object -- optional).

Objects which define a relation between generic Template Elements and scenario-specific ones. These relations are not included in the Template Element objects, for the purpose of generality and simplicity. For example, a Relational object may consist of a pointer to an ORGANIZATION object (generic), a pointer to a PERSON object (generic), a slot representing the role that the person has in that organization (scenario-specific), and, perhaps, a slot containing temporal information (generic).

LEVEL 3 (Scenario Template Object).

For each IE scenario, it is envisioned that there will be exactly one scenario-specific object type. It captures the essential relation or event of interest in the task. This object type will have pointers to the Template Element object types appropriate for the task, as well as pointers to any Relational objects defined for the task. It may also contain slots that are defined as part of the Template Element subtask.

LEVEL 4 (Top-Level Template Object).

For each text that is relevant to an IE scenario, there will be exactly one Top-Level Template object. It will identify the text and will contain one or more pointers to Scenario Template objects.


Information Extraction Task Definition - 14 JUN 95
[Next] [Previous] [Top] [Back to MUC-6 main page]

Generated with CERN WebMaker