Machine Translation

Proteus Project

Department of Computer Science
New York University

Automatic (``machine'') translation between natural languages has long been the Holy Grail of research in Natural Language Processing (NLP). Warren Weaver speculated about ``mechanical translation'' before digital computers even existed. Today, automatic translation services abound on the Web, but they are more often a source of entertainment than a source of useful information. High quality machine translation in unrestricted domains of discourse remains an elusive goal despite half a century of research. Meanwhile, the need for machine translation (MT) keeps growing.

The Proteus Project at NYU has been involved in MT research for over a decade. Until a couple of years ago, the majority of our effort has focused on automatically generating MT systems from parsed bilingual texts ("bitexts"). We used the same parser and comparable grammars to analyze aligned sentences from two languages. From these aligned parses, our system automatically derived transfer rules, in the style of example-based MT. An early publication on this topic was Prof. Ralph Grishman's "Iterative Alignment of Syntactic Structures in a Bilingual Corpus". More details have been published on aligning nodes in bilingual parse trees, acquiring transfer rules and using the transfer rules to translate new text.

When Prof. Dan Melamed joined the Proteus Project in 2001, we began pursuing an ambitious new MT research program, based on structured models of translational equivalence. This new approach builds on several recent breakthroughs in NLP and related fields, including our Empirical Methods for Exploiting Parallel Texts and the paradigm of statistical machine translation by parsing (see below). Automatic methods for inducing structured translation models have become very popular in the past year, and they now seem like a promising avenue of MT research.

Adam Meyers and Shasha Liao of NYU, along with Michiko Kosaka of Monmouth University and Nienwen Xue of Bradeis University have recently revived the previous work deriving machine translation using both sides of linguistically analyzed bitexts.  The initial work is presented in the paper Improving MT Word Alignment Using Aligned Multi-Stage Parses.

Available Software

GenPar Toolkit for Generalized Parsing (including MT by parsing)

Recent Publications