[ Project Goals, Project Members, Project Publications, Proteus, NYU, GE, Rutgers SCILS ]

NATURAL LANGUAGE INFORMATION RETRIEVAL

Continuing participation in the Text Retrieval Conference (TREC-5)

trec5@crd.ge.com

Principal Investigator: Tomek Strzalkowski

GE Research and Development Center
Bldg. K-1, rm 5C40
P.O. Box 8
Schenectady, NY 12301
phone (518) 387-6871
fax   (518) 387-6845

Participation Category: A

Team participants: GE, NYU, Rutgers, Lockheed Martin

GENERAL

This proposal is a continuation of the TREC participation for the GE/New York University/Rutgers group. This year we are joined by Lockheed Martin's Tipster/MUC team.

For previous TREC's the GE/NYU group has developed a prototype Natural Language Information Retrieval System which uses advanced linguistic processing techniques to enhance the effectiveness of traditional term-based document retrieval. The backbone of our system is a statistical information retrieval engine which performs automated indexing of documents, then search and ranking in response to user queries. This core architecture is augmented with robust natural language processing tools which are used to process text documents (both database documents and user's queries). These tools include a dictionary-assisted word stemmer, a part-of-speech tagger, a syntactic parser, a pattern-matcher, and a statistical program package for computing word and phrase correlations in a given text database. We believe that when used properly, automated natural language processing could become a significant factor in bringing about a new generation of text retrieval systems.

Summary Of TREC-4 Participation

For TREC-4 we participated in the main evaluations submitting two runs in each ad-hoc and routing categories. Our focus in 1995 was on the following:
  • Get better phrasal terms: we improved structural disambiguation method for complex nominal groups.
  • Massive query expansion in routing: we updated and reimplemented the initial TREC-3 solution, obtaining a very effective module.
  • Manual query expansion and automatic processing: we used this technique effectively in ad-hoc runs.
  • Index partitioning experiments: we continued to explore this.
  • Name extraction: we extracted proper names, but did not categorize them.
  • New weighting scheme for phrases: we continued to improve and look for more alternatives. Overall, our results were quite good, as we moved up in the rankings in all categories.

    Publications and Reports

    Natural Language Information Retrieval: TREC-3 Report
    Tomek Strzalkowski, José Pérez-Carballo, Mihnea Marinescu
    Natural Language Information Retrieval: TREC-4 Report
    Tomek Strzalkowski and José Pérez-Carballo
    Assessed Relevance and Stylistic Variation
    Jussi Karlgren
    Abstract of poster presentation at ACM SIGIR, Zürich, August 19-21
    Stylistic Variation in an Information Retrieval Experiment
    Jussi Karlgren
    Paper presented at NeMLaP, Ankara, September 16-18
    Slides for the talk
    Visualizing Stylistic Variation
    Jussi Karlgren
    Paper to be presented at HICSS, Maui, January 7-10, 1997.
    Experiments in Stylistic Analysis
    Jussi Karlgren, Troy Straszheim
    A brief outline of the NYU contribution to TREC 96.
    Natural Language Information Retrieval: TREC-5 Report
    Tomek Strzalkowski, Louise Guthrie, Jussi Karlgren, Jim Leistensnider, Fang Lin, José Pérez-Carballo, Troy Straszheim, Jin Wang, Jon Wilding
    Experiments in Query Clustering
    Jussi Karlgren
    A brief outline of further failed experiments.

    Internal Project Documents

    ... for project members.

    Members

    General Electric:
    Tomek Strzalkowski, strzalkowski@crd.ge.com, (518) 387-6871
    Fang Lin, fangl@crd.ge.com
    Lockheed Martin:
    Louise Guthrie, guthrie@mdso.vf.ge.com, (610) 354-2504
    Jim Leistensnider
    Jon Wilding
    NYU:
    Jussi Karlgren, karlgren@cs.nyu.edu, (212) 998-3496
    Troy Straszheim, troys@cs.nyu.edu
    Rutgers:
    José Pérez-Carballo , carballo@carballo.rutgers.edu, (908) 932-0530

    Useful Links


    [ Project Goals, Project Members, Project Publications, Proteus, NYU, GE, Rutgers SCILS ]

    Page maintained by Jussi Karlgren. Comments: karlgren@cs.nyu.edu.