SMT is based on the comparison of parallel texts with high content volume and the calculation of the most probable translation. SMT has a feature of self-training, and translation quality with this approach depends directly on the volume of the parallel data for training. SMT engines today are mostly created on the basis of Moses, a free statistical machine translation engine.

Following the industry trend, PROMT researched Moses and developed its own approach for training an SMT engine on given parallel corpora.

Key Components:

  • Translation Model. Basically, this is a table with the following columns: source n-gram, target n-gram, and probability that this source n-gram translated with this target n-gram. The Translation Model is used to create translation candidates.
  • Language Model (LM). The LM is a statistical model of target language with probabilistic characteristics of n-grams in target language. It is built on target corpora and used for evaluation translation candidates created by the Translation Model.

Key Advantages:

  • Fast and fully automated engine training (in most cases, language-independent process)
  • Fluency (since the MT output is built from pieces of human translated texts)

As an SMT engine is trained on given texts, PROMT trains a client-specific SMT engine for a dedicated client. PROMT uses its own algorithms to correct the alignment in the client's corpora and to take into account the specifics of the target language.

