Department of Computer Science
New York University
That is a fully automatic acquisition of grammar from a syntactically tagged corpus, instead of human labors or statistically aided human labor which have been used in many conventional projects. Although there are some problems with this strategy, such as the availability of such a corpus and domain restrictions, the performance of the grammar is fairly good. The author believes the idea shows one of the promising directions for the future of the natural language research.
The parser generates a syntactic tree just like the PennTreeBank (PTB) bracketing. Although the latest release (Version 2.0) of PTB has argument structure labels, this parser does not produce such labels. Also APP is just trying to make a parse tree as accurate as possible for reasonable sentences. Here reasonable sentences means, for example, sentences in newspapers or well written documents. Hence, it is aiming neither to parse some reasonable ill-formed sentences (like conversation) nor to refuse absolutely ill-formed sentences. You may be surprised that the parser can make a parse tree for a sentence with number dis-agreement or it can't parse correctly a very simple English sentence. But this is a result of how APP is designed.
The author knows that the performance is not the best compared with the state of art parsers which have been reported recently. However, the author knows the main difference between my parser and these parsers. It's the usage of lexical information. We're planning to incorporate this information into the parser and hopefully we will release the new version soon.
Then:
APP Version 5.8 (October.2.1996)
This was achieved by changing formula for the probability calculation, and using 5 non-terminal (instead of 2 non-terminal previously). You will find that there are two S categories and two NP categories, namely S and SS, NP and NPL and also TOINF is used as non-terminal.
Current Inhouse Best (Ver 6.3)
Recall / Precision are 79.55 / 77.18 .
It uses lexical bigram information.
We hope to make more improvement, and then we will destribute that version.
However, if you like to use this version, please contact me.
Satoshi Sekine
Ralph Grishman
`A New Direction for Sublanguage NLP'
`Automatic Sublanguage Identification for a New Text'