Information Retrieval and Extraction Exercise

(IR & IE Contest for Japanese language)

Last modification, February 14, 1999


An IR (Information Retrieval) and IE (Information Extraction) contest for Japanese language is planned to be held. The first contest will be "grass roots" style; international participants are welcome provided that people share the following objectives:
  • To contribute to the improvement of the field
  • To widen the research area and circle of researchers
  • To increase the amount of corpus and database
  • To promote this kind of effort in the future
  • There is no fee to participate, although if you participate IR, you have to buy a newspaper corpus, "Mainichi Shinbun (94,95)" which is sold by a company, "Nichigai-associates" (about $1,000 each year). For NE participants, thaks to Mainichi Shinbun, they will provide us the test articles for free. Also, most of the information (definitions, etc) will be distributed in Japanese, please don't expect to translate everything to other languages. (Sekine ( will privately assist you in some cases.) Note that this is not a contest using the NACSIS collection.


    Two tasks are planning to be held. Anyone can participate one or both tasks.

  • NE: Named Entity Task
    Basically, it is similar to the MUC-NE or MET task. There are minor differences, like "artifact" which includes product names, names of services, etc is added. Also, there will be two kinds of test: one is for general domain texts (60-70 articles), and the other is for specific domain texts (30-40 articles). The domain will be announced about two weeks before the test period.
  • IR: Information Retrieval
    Basically, it is similar to the TREC adhoc task. The target is to retrieve about 300 relevant documents from two years of newspaper articles. There will be about 30 topics.
  • These tasks are designed for technology evaluations, rather than commercial purposes. For example, many people discussed that interface is an important issue in IR, etc, however, these issues should be addressed in the future IREX.


    June 30, 1998 : Distribute draft version of definitions, sample data
    July 31, 1998 : Preliminary application due (this is not a hard deadline)
    September 16, 1998 : Close the discussion for the definitions
    == Dry-run ==
    November 9 ,1998 : IR topics distribution
    November 16, 1998 : IR result due
    November 17, 1998 : NE text distribution
    November 20, 1998 : NE result due
    February 28, 1999 : Application due
    == Formal-run ==
    April 5, 1999 : Distribute IR queries
    April 12, 1999 : IR result due (JST 23:59)
    April 13, 1999 : Freeze NE system development
    May 13, 1999 : Distribute NE tasks
    May 17, 1999 : NE result due (JST 23:59)
    September, 1999 : Workshop (planned, in Tokyo)

    More Information (in Japanese - EUC)


    Please send a signed registration form to Dr.Isahara (CRL) by March 15, 1999. (This is not a hard deadline, please contact, if you have your individual concern.) Address is shown in the form.


    Organized by : IREX Committee
    Mailing list :
    Co-chair : S.Sekine (NYU), H.Isahara (CRL)
    Advisor : M.Nagao (Kyoto-U), H.Tanaka (TITech), R.Grishman (NYU), T.Ishikawa (ULIS), D.Harmon (NIST) H.Iida (SONY)
    Committee Member : T.Tokunaga (TITech), S.Kurohashi (Kyoto-U), M.Okumura (JAIST), C.Nobata (U-Tokyo), K.Kita (Tokushima-U), K.Inui (KIT), Y.Nakagawa (YNU), A.Fujii (ULIS), T.Wakao (TAO), N.Kando (NACSIS), K.Hashida (ETL), E.Sumita (ATR), M.Murata, K.Uchimoto (CRL), N.Noguchi (Matsushita), A.Okumura, S.Fukushima (NEC), Y.Ogawa (RICOH), T.Sakai (Toshiba), J.Fukumoto (Oki), T.Kitani, Y.Eriguchi (NTT Data), S.Nakawatase (NTT), J.Tomiura (Mitsubishi), R.Ochitani (Fujitsu), S.Ogino (IBM)