The traditional natural language processing pipeline incorporates multiple stages of
linguistic analysis. Although errors are typically compounded through the pipeline, it is
possible to reduce the errors in one stage by harnessing the results of the other stages.
This thesis presents a new framework based on component interactions to approach this goal. The new framework applies all stages in a suitable order, with each stage generating multiple hypotheses and propagating them through the whole pipeline. Then the feedback from subsequent stages is used to enhance the target stage by re-ranking these hypotheses, and then produce the best analysis.
The effectiveness of this framework has been demonstrated by substantially improving the performance of Chinese and English entity extraction and Chinese-to-English entity translation. The inference knowledge includes mono-lingual interactions among information extraction stages such as name tagging, coreference resolution, relation extraction and event extraction, as well as cross-lingual interaction between information extraction and machine translation.
Such symbiosis of analysis components allows us to incorporate information from a much wider context, spanning the entire document and even going across documents, and utilize deeper semantic analysis; it will therefore be essential for the creation of a high- performance NLP pipeline.