Skip to main content

Data

 

Centers for Disease Control and Prevention COVID-19 translations

Language Pair and Version Capture Date Source Words Source Words (Deduplicated) Sample File
English-Spanish v1 2020-06-24 538,842 248,780 TMX Sample
English-Vietnamese v1 2020-06-24 550,066 249,006 TMX Sample
English-Korean v1 2020-06-24 537,204 262,393 TMX Sample
English-Chinese v1 2020-06-24 508,297 254,876 TMX Sample

These translation memories are made available under the Open Data Commons Attribution License. Individual contents of the database are in the public domain.
Source: CDC; Reference to specific commercial products, manufacturers, companies, or trademarks does not constitute its endorsement or recommendation by the U.S. Government, Department of Health and Human Services, or Centers for Disease Control and Prevention; The material is available on the agency website https://www.cdc.gov/ for no charge.

World Bank Open Knowledge Repository translations

Language Pair and Version Capture Date Source Words (Deduplicated) Attribution
English-German v1 2020-11-23 20,804 Attribution Page
English-Spanish v1 2020-12-20 1,116,942 Attribution Page
English-French v1 2020-12-27 1,200,938 Attribution Page
English-Italian v1 2020-12-16 10,941 Attribution Page
English(United States)-Portuguese(Brazil) v1 2020-12-11 335,783 Attribution Page
English-Romanian v1 2020-12-16 6,347 Attribution Page
English-Indonesian v1 2020-11-23 69,488 Attribution Page
English-Chinese v1 2020-12-13 93,922 Attribution Page
English-Turkish v1 2020-12-13 37,148 Attribution Page
English-Ukrainian v1 2020-12-13 3,616 Attribution Page
English-Vietnamese v1 2020-12-14 43,431 Attribution Page

Web crawled translations of documents published in the World Bank Open Knowledge Repository. This data set was compiled by and is offered by Polyglot Technology LLC. Individual contents of the database are licensed CC BY 3.0 IGO or CC BY 4.0 by the World Bank.

Healthcare.gov translations

Language Pair and Version Capture Date Source Words Source Words (Deduplicated) Sample File
English-Spanish v1 2019-04-16
112,268 TMX Sample

This data set was compiled by and is offered by Polyglot Technology LLC. Individual contents of the data set are in the public domain.

U.S. Department of State News Releases

Language Pair and Version Capture Date Source Words Source Words (Deduplicated) Sample File
English-Arabic v2 2020-09-03
1,915,865
English-Spanish v2 2020-09-03
1,045,304
English-French v2 2020-09-03
1,099,530
English-Hindi v2 2020-09-03
693,624
English-Russian v2 2020-09-03
1,345,858
English-Urdu v2 2020-09-03
775,675
English-Chinese v2 2020-09-03
86,071
English(United States)-Portuguese(Brazil) v2 2020-09-03
806,593
English-Vietnamese v2 2020-09-03
8,918
English-Indonesian v2 2020-09-03
28,011
English-Persian v2 2020-09-03
15,979

Web crawled translations of press releases of the U.S. Department of State (February 2017 - October 2020). These data sets were compiled by and are offered by Polyglot Technology LLC. Individual contents of the data set are in the public domain.
Source: U.S. Department of State

 

Custom Crawling

We also offer customized parallel data crawling for the languages and domains you need. Please contact us at info@polyglot.technology.