Skip to main content

MT Decider Benchmark Q4/2022 now available

The Q4/2022 edition of the MT Decider Benchmark with the latest comparison of machine translation quality with Amazon Translate, DeepL, Google Translate and Microsoft Translator is out!

  • With the addition of English↔Korean the benchmark now covers 25 language pairs.
  • Quote handling differences by online MT services significantly distort BLEU score results. For the benchmark we now apply quote normalization before calculating BLEU scores. Now the metrics COMET and BLEU agree more often on the best service for a language direction, allowing you to confidently choose the best MT service.
  • We kept test data fresh by updating to 2021 data, where available.
  • We used the the latest evaluation libraries sacreBLEU 2.3.1 and COMET 1.1.3, incorporating the latest innovations and bug fixes from academic research.
  • Instead of naming the benchmark with the quarter when machine translations were captured, we now name it with the quarter when the benchmark reports are compiled. This is why MT Decider Benchmark Q4/2022 follows MT Decider Benchmark Q2/2022. We plan to continue the quarterly publication of the benchmark.

Last, but not least, here is the Q4 MT Decider Index, ranking MT services across all language pairs covered by the benchmark:

Q4/2022 Q2/2022 Change MT Service
1 1
Google Translate
2 3
3 2
Microsoft Translator
4 4
Amazon Translate

DeepL ranks very well for the language pairs it supports. Google wins because it supports more language pairs. See details for the language directions you care about in the MT Decider Benchmark. You can download a trial report for the French→English language direction.

MT Decider Benchmark offers reliable machine translation quality metrics and data at a lower cost than if you ran the evaluation yourself. For evaluation on your data please contact