Department of Computer Science
New York University
The Proteus Project at NYU has been involved in MT research for over a decade. Until a couple of years ago, the majority of our effort has focused on automatically generating MT systems from parsed bilingual texts ("bitexts"). We used the same parser and comparable grammars to analyze aligned sentences from two languages. From these aligned parses, our system automatically derived transfer rules, in the style of example-based MT. An early publication on this topic was Prof. Ralph Grishman's "Iterative Alignment of Syntactic Structures in a Bilingual Corpus". More details have been published on aligning nodes in bilingual parse trees, acquiring transfer rules and using the transfer rules to translate new text.
When Prof. Dan Melamed joined the Proteus Project in 2001, we
began pursuing an ambitious new MT research program, based on
structured models of translational
equivalence. This new
approach builds on several recent breakthroughs in NLP and related
fields, including our Empirical
Methods for Exploiting Parallel Texts and the paradigm of
statistical machine translation by parsing (see below). Automatic
methods for
inducing structured translation models have become very popular in the
past year, and they now seem like a promising avenue of MT
research.
Adam Meyers and Shasha Liao of NYU, along with Michiko Kosaka of Monmouth University and Nienwen Xue of Bradeis University have recently revived the previous work deriving machine translation using both sides of linguistically analyzed bitexts. The initial work is presented in the paper Improving MT Word Alignment Using Aligned Multi-Stage Parses.
GenPar Toolkit for Generalized Parsing (including MT by parsing)
I. Dan Melamed (2004). Statistical Machine Translation by Parsing [PS] [PDF] Proceedings of the 42nd Annual Conference of the Association for Computational Linguistics (ACL-04), Barcelona, Spain. This version has a couple of typos corrected. A longer (easier to understand) version that has been submitted for publication is here: [PS] [PDF]. For now, you can cite it as Proteus Project technical report #04-024.