Skip to main content

MT Decider Index Q2/2022

As a machine translation user you want to use the online MT service with the highest translation quality for each language pair you are translating. But evaluating ever changing MT services across many language pairs is hard! 

Polyglot Technology solves this challenge by producing the MT Decider Benchmark, a vendor-independent, transparent, and up-to-date evaluation of online MT services every quarter for 24 language pairs. 

The MT Decider Index is a cross-language ranking distilled from the MT Decider Benchmark. This is the MT Decider Index for the second quarter of 2022:

  1. Google Translate
  2. Microsoft Translator
  3. DeepL
  4. Amazon Translate

The MT Decider Benchmark Q2/2022 launches in mid-August. To be notified when the MT Decider Benchmark is available please sign-up with your email address for updates here:

To learn about TAUS DeMT Evaluate, an evaluation service and report jointly created by TAUS and Polyglot Technology, the MT Decider Index and the MT Decider Benchmark, please attend this Nimdzi Live August 3rd webcast with Anne-Maj van der Meer (TAUS), Amir Kamran (TAUS), myself, Achim Ruopp (Polyglot Technology) and the host Tucker Johnson.

MT Decider Index Details

Test Data

The Conference on Machine Translation (WMT), a yearly evaluation of MT systems by academia since 2006, provides a great set of high-quality translations news translations in 23 language pairs: Czech↔German, German↔English, German↔French, English↔Spanish, English↔Estonian, English↔Finnish, English↔French, English↔German, English↔Hungarian, English↔Italian, English↔Lithuanian, English↔Latvian, English↔Polish, English↔Romanian, English↔Gujarati, English↔Hindi, English↔Tamil, English↔Japanese, English↔Chinese, English↔Kazakh, English↔Turkish, English↔Pashto and English↔Russian. This is the test data used to create the language pair ranking. 
 
The MT Decider Benchmark also includes evaluations for transcribed spoken language data from the International Conference for Spoken Language Translation (IWSLT), available for some language pairs. Evaluations for the IWSLT datasets were omitted from the MT Decider Index to create a true apples-to-apples cross-language ranking.

Evaluated Online MT Services

Polyglot evaluated globally available online MT services that users can sign up with a credit card, with a focus on providers in the Americas and Europe: Amazon Translate, DeepL, Google Translate and Microsoft Translator. Polyglot is interested to include additional providers in the MT Decider Index in the future, provided that the services are available and accessible globally, and that users can sign up with a credit card.

The services were evaluated on 22nd and 23rd June 2022.

The evaluated services support all evaluated language pairs with the exception of DeepL which of the evaluation dates did not support the following language pairs: English↔Gujarati, English↔Hindi, English↔Tamil, and English↔Kazakh

Evaluation Method and Ranking Calculation

Polyglot uses the established, two-decade-old BLEU evaluation metric for machine translation to rank the MT services. The language pair-specific rankings are consolidated across all language pairs using the Borda Count, a ranked voting system, to produce a consensus result - the MT Decider Index.

Because MT Decider Index is a consensus result, it is a good pick when choosing just one MT service to use across all language pairs for convenience. Of course you can achieve better quality by picking the best MT service by language pair - language pair rankings are available in the MT Decider Benchmark.
 
The MT Decider Benchmark includes evaluation of the MT services with the novel MT evaluation method COMET, developed by Unbabel. COMET takes the source text of a translation into account and aims to judge translations more on their semantic content. COMET has recently shown close correlation with human judgments of translation quality.
 
Edit 3-August-2022: added credit to Unbabel for COMET development