Skip to main content

Healthcare MT with Google AutoML Translation

To help people make the right choices for their healthcare, the U.S. Centers for Medicare & Medicaid Services provide the site https://www.healthcare.gov/ as an information hub. The Centers try to reach many language communities, which is especially important for an aging population. With Spanish being the native language of roughly 13% of the US population, the most effort is put into a Spanish version of the site - https://www.cuidadodesalud.com/.

Reading information on a website is only the first step to get health insurance - there are navigators, assisters, partners, agents and brokers that assist in signing up for insurance. Wouldn't it be great if these people had a customized MT system available to communicate with people that need insurance? Such an MT system could also provide initial translations for English content that is not (yet) translated, also as post-editing drafts for translators translating healthcare/health insurance information for this site or otherwise.
How do we go about training and evaluating such a custom healthcare MT system?

Data

We need highly specific English-Spanish parallel data to train and evaluate the custom MT and what better place to get this but the healthcare.gov/cuidadodesalud.gov website itself? We can use the Bitextor tool to crawl the site and build a TMX file. For this specific site, Bitextor, after applying some tweaks, produces a TMX file with 7866 segments. De-duplicating this by source and target leaves us with 5440 segments containing 113,251 English source words.
This is about 40% fewer words than on the English site itself. This could be due to not all of the English content being translated to Spanish or overly aggressive cleaning and boilerplate removal by Bitextor. In addition some segments that originated from bulleted lists are merged together into single segments - separating them without splitting up other sentences is not an easy problem. We will work with what we have. Here is a small sample of the data:

A sample TMX file is downloadable here.

Training a Custom Google AutoML Translation Model

If we wanted to build an MT system from scratch, we could download one of the widely available neural machine translation (NMT) open source toolkits like OpenNMT and fine-tune a English-Spanish baseline system with our healthcare data. That there aren't pre-trained English-Spanish baseline system available presents only the first obstacle we have to overcome with this Do-It-Yourself approach. There are many more obstacles: hardware needs, software setup, pre-processing, post-processing, system training, operation, integration, ...

These days it is easier to use one of an increasing number of MT solutions that offer domain-adaptation. In no particular order: Slate, Globalese, Systran, Microsoft Custom Translator, IBM Watson Language Translator, Google AutoML Translation, SDL Machine Translation, ModernMT, KantanMT, Omniscien Technologies, Tilde Machine Translation and Iconic Translation Machines. This field is changing quickly. For a recent overview see this presentation by Konstantin Savenkov from Intento.

Because it is backed by a great research team, we can customize an engine with the 5000+ segments we have and because it is easy to sign up, we choose Google AutoML Translation. If you aren't yet a Google Cloud customer, you can sign up with a $300 credit. With the latest version v3 of the Google Cloud Translation API translation of the first 500,000 characters per month is free.

We can just upload the TMX file to Google AutoML Translation. The upload de-duplicates the data set by source text only to 5,302 segments. Why this is necessary isn't entirely clear. Maybe to reduce ambiguity when fine-tuning the custom system on a small data set? Once we start training a custom model, AutoML Translation selects 4,772 segments as domain-adaptation training data and 530 segments as test data (we didn't specify a test set ourselves). After just over 1¼ hours at a cost of $97.24 the model training training is finished:

Our custom model is more than 9 BLEU points better than standard Google Translate. 7% of all custom translations are now identical to the human translations, up from 0% for the standard Google Translate translation. Success!

Let's have a look at some of the improved translations:

Source Text Human Translation Custom AutoML Translation Standard Google Translate
Once you're enrolled, your plan can't deny you coverage or raise your rates based only on your health. Una vez que esté inscrito, su plan no puede negarle cobertura o elevar sus tarifas basándose sólo en su salud. Una vez que esté inscrito, su plan no puede negarle cobertura o aumentar sus tarifas basado únicamente en su salud. Una vez que está inscrito, su plan no puede negarle la cobertura ni aumentar sus tarifas solo en función de su salud.
Note: Most health plans sold outside Open Enrollment don't count as qualifying health coverage. Nota: La mayoría de los planes médicos que se comercializan fuera de Inscripción Abierta no califican como cobertura calificada. Nota: La mayoría de los planes de salud que se venden fuera del Período de Inscripción Abierta no cuentan como cobertura calificada. Nota: la mayoría de los planes de salud que se venden fuera de la Inscripción abierta no cuentan como cobertura de salud calificada.

From human evaluation by native Spanish speakers it is clear that the custom system is better in using the healthcare.gov terminology and some of the phrasing on the site. You can access the full test set here - comments are enabled and feedback is welcome!

But there is still room for improvement. Maybe if we used the new glossary feature, we could ensure use of the correct terminology even more? We will investigate this in a future post. 

Also, the tone of the Spanish human translation is sometimes more formal than the English source. There are some ways to ensure a more formal tone, see "Controlling Politeness in Neural Machine Translation Constraints" by Sennrich, Haddow and Birch, but they not part of Google Translate (yet?).

Evaluation

Relying on the BLEU score and a brief review of the translations, even by a native speaker, is often not enough to be confident in the use of the custom MT system. For in-depth human evaluation we can use a tool like TAUS DQF Tools to evaluate adequacy and fluency of the machine translations and rank human, custom and standard machine translations.

Beyond the common automatic (BLEU) and human (adequacy/fluency, ranking) evaluation additional metrics need to be adapted to the use case. Measure what matters.

For post-editing there are some automated metrics that become important: Translation Edit Rate (TER) and zero-edit segments (the 7% custom MT segments identical to the human translation). In recent years the community also established best practices to instrument the post-editing process and annotate errors. We'll discuss these in detail in an upcoming blog post on how to integrate the custom system into a post-editing process.

If we consider the output quality of the custom MT system good enough to be used without editing, e.g. for written support communication between people needing health insurance and people helping them, other metrics come into play. For example: What percentage of the support requests are resolved without the escalation to a Spanish speaker? Do people find the information that is relevant to their needs?

Data Privacy

The questions that always come up with cloud solutions like Google AutoML Translation are: Is my data safe in the cloud? Will Google use my translations to improve their own MT engine? Google answers these in the Cloud Translation Data Usage FAQ and the AutoML Data Usage FAQ.

In the later the question "Does Google use my data for improving Google products?" is anwered with "Currently, Google does not use the content you send to train and improve our features.". Of course "Currently" qualifies the answer - there might be a point in the future where Google would like to use the training data. I'm confident they would give ample notice to when this would happen.

Also, because it is relevant to the healthcare domain here is the document about HIPAA Compliance on Google Cloud Platform covering the Cloud Translation API and Cloud AutoML Translation.

Conclusion

If you have a translation memory available that is very similar to the material you want to translate, Google AutoML Translation is a very good, cost-efficient way to train a custom MT system with a much better translation quality than standard Google Translate. This better machine translation quality can open up efficiencies and new use cases in both post-editing machine translations and using them in their raw form. Additionally custom MT models can be easily integrated using existing integrations of Google Translate in a variety of environments like content management tools, translation environments, etc.

Training Data Offer

If you want to train the custom MT system described above yourself, train your own MT system with the data or use the TM for other purposes, you can purchase the training data here.

I would like to thank Nadja Honeywell for providing feedback on the Spanish translations.

Updated 5/20/2019 for typo. Updated 5/29/2019 to add Omniscien Technologies.