Hybrid Translation Technology

Basic Translation Technologies

Currently, machine translation is presented with two main technologies:

a) Rule-based machine translation (RBMT), i.e., translation based on rules and
b) Statistical machine translation (SMT).

RBMT is built on the linguistic description of two natural languages (bilingual dictionaries and other databases containing morphological, grammatical and semantic information), formal grammar and translation algorithms. Translation quality depends on the volumes of linguistic databases (dictionaries) and on the depth of the natural languages description, i.e., it requires consideration of the maximum grammatical structure features.

SMT is based on the comparison of parallel texts with high content volume and calculation of the most probable translation. SMT has a feature of self-training, and translation quality in this approach depends directly on the volume of the parallel data for training.

Both technologies have advantages and disadvantages, but, as of yet, mankind has not realized the dream of creating the "perfect" automatic translator.

Hybrid Translation System

Traditionally, when statistical MT aspires to use the linguistic data for translation quality improvement, with the "classical" approach based on rules, it looks for the application of statistical techniques in its technology.

In particular, PROMT, developer of MT systems based on rules (RBMT), has been working on the development of statistical techniques use in machine translation for more than two years already, and today, as a result of this research, they have developed a hybrid translation technology.

Hybrid technology uses RBMT-technology and statistical techniques for the following:

Creation of dictionary bases automatically on the basis of parallel corpora
Creation of several translation variants – at lexicon and sentence structure level
Post editing in automatic mode
Selection of the best (most probable) translation

This approach allows for the following

Maintaining the rule-based technology benefits (creation of syntactically connected and grammatically correct text, terminological uniformity, etc.)
Getting the benefit of statistical MT (fast learning ability, data acquisition from parallel corpora in automatic mode, text smoothness, etc.)

It is worthwhile to note that while statistical machine translation requires enormous volumes of parallel texts, the hybrid technology allows using of rather small volumes.

In practice the hybrid technology can be used both for enterprises where there can be ready volumes of parallel texts, and for online services, if it is suggested to Internet community to take part in online parallel corpora creation.

Each implementation of the new PROMT solution will be realized as an individual project. Its first users will appear in 2011 in the American market.