NomBank is an annotation project at New York University that is related to the
PropBank project at the University of Colorado.  Our goal is to mark the sets of arguments that cooccur with nouns in the PropBank Corpus (the Wall Street Journal Corpus of the Penn Treebank), just as PropBank records such information for verbs.  As a side effect of the annotation process, we are producing a number of other resources including various dictionaries, as well as PropBank style lexical entries called frame files. These resources help the user label the various arguments and adjuncts of the head nouns with roles (sets of argument labels for each sense of each noun).  NYU and U of Colorado are making a coordinated effort to insure that, when possible, role definitions are consistent across parts of speech. For example, we are using PropBank's frame file for the verb "decide" in our annotation of the noun "decision". However, our coordination goes far beyond that.

    We began this project firmly on the shoulders of Catherine Macleod's Nomlex project and related work on support verbs. This turned out to be a big boost as about one half of the argument-taking nouns in the corpus are nominalizations or nouns that have nominalization-like properties (e.g., "aggression" and "agenda" have argument structures similar to the verbs "destroy" and "schedule"). NomBank has forced us to define noun argument structure in great
detail, including areas that have little if any previous research.  So in many ways, we are carving out new ground. Some sample phenomena that we cover include: support verb constructions ("John MADE A DECISION"); arguments across copulas ("His decision WAS TO LEAVE"); and parenthetical PP constructions ("Trading in Cineplex Odeon Corp. shares was halted on the New York and Toronto stock exchanges late yesterday afternoon AT THE COMPANY'S REQUEST"). Each of these phenomena is interesting in that an argument of a noun may occur outside of the NP headed by that noun. In addition to nominalizations of verbs ("decision", "helper", "nominee"), we also cover nominalizations of adjectives ("incompetence", "ability", "wisdom"), relational nouns ("president", "friend", "father"), partitive nouns ("barrage", "clump", "variety"), and several other types of argument-taking nouns.


Executive of Logical Features (ELF) Adam Meyers
Editor of Logical Features (ELF) Ruth Reeves
Emboldened Linguistic Foremother (ELF) Catherine Macleod
Enterer of Logical Features (ELF) Rachel Szekely
Enterer of Logical Features (ELF) Veronika Zielinska
Enterer of Logical Features (ELF) Brian Young

NomBank.1.0 Release (tgz, zip):

    On December 17, 2007, we released NomBank.1.0 covering all the "markable" nouns in the Penn Treebank II Wall Street Journal corpus.   This release includes a total of 114,576 propositions derived from looking at a total of 202,965 noun instances and choosing only those nouns whose arguments occur in the text. An additional 35,000 nouns were not looked at because those nouns never take arguments.

    This release completes NomBank.1.0. During the final stages, we completed our analysis of the remaining NOMLEX classes (we had saved the rarer classes for the final stages) and have performed extensive quality control measures based on syntactic patterns and lexical information also updated all dictionaries related to this effort and all specifications. For previous releases, we looked at about 21,000 instances tagged as likely errors. For this release, we looked at an additional 13,000 instances.

    This release also includes substantial updates of the documentation to
include more details about NOMBANK classes, especially about NOMBANK entries derived from adjectives adn adverbs. An additional document entitled "Those Other NomBank Dictionaries" is also included in order to provide clearer descriptions of the supplemental dictionaries included with NomBank.

    Nombank can be downloaded in the form of the attached
tgz or zip file. Alternatively, you can browse the directory linked here.

    Please note that we are in the process of determining how to distribute some additional files that require licenses from the LDC. These include: (i) COMNOM, an update of COMLEX Syntax to include noun complements extracted automatically from NOMLEX-plus (the LDC holds the license to COMLEX Syntax); and (ii) a human-friendly version of NomBank that includes over 100,000 snippets from the Wall Street Journal corpus. We will update this website with instructions for obtaining these as soon as we can.

Previous Releases:  We have removed preliminary releases of NomBank from our website.

Future Releases:  We antipate annotating additional data as part of the Unified Linguistic Annotation project. In addition, NomBank will contribute to the 2008 CONLL task and will create a small amount of additional annotation in connection therewith.

NomBank Specifications  can be viewed here, but are also included in the nombank tgz and zip archives. We have also created separate specifictions for some of the other dictionaries created with NomBank. That document, entitled Those Other NomBank Dictionaries can be viewed here or downloaded as part of one of the archive (tgz or zip) files.  Alternatively, click here for a quick users guide that assumes some familiarity with PropBank.

Resource Guidelines for Use with the Penn Treebank Training Corpus:

Participants in CONLL 2008 task and other tasks involving a split of the Penn Treebank between training (sections 2-21) from the other sections, e.g., 23 is for testing and 24 is for development. Since NOMLEX-PLUS and COMNOM (see
Those Other NomBank Dictionaries) involve information derived from corpus annotation, these users need to finesse how they use these resources. For both dictionaries, there are "training" versions (where corpus information is derived only from the training sections) and "clean" versions which should be used if this division of the corpus is to be maintained. For other uses, e.g., gigabytes of non-wsj data, there is no particular reason to stick to the training or clean versions of these resources.

To clarify, it is illegal for CONLL 2008 participants to use normal NOMLEX-PLUS or COMNOM (to be distributed by the LDC). If they want to use these resources, they can only use the clean and training versions (NOMLEX-plus-clean.1.0,  NOMLEX-plus-training.1.0, COMNOM-clean.1.0,  COMNOM-training.1.0).


NomBank was supported under Grants N66001-001-1-8917 and N66001-04-1-8920 from the Space and Naval Warfare Systems Center San Diego and National Science Foundation grant CNI-0551615.  The reports and papers of the NomBank project do not necessarily reflect the position or the policy of the U.S. Government.

