Who Offers the Best Chinese-English Machine Translation? A Comparison of Google, Microsoft Bing, Baidu, Tencent, Sogou, and NetEase Youdao
Jun 27, 2018 · 1097 words · 3-minute read
At work, I frequently share Chinese-language articles with English-speaking colleagues and English articles with Chinese-speaking colleagues. I reluctantly started using machine translation last year, after the amount of material to translate became overwhelming. I was pleasantly surprised by the machine translation’s quality and thus would like to find out which company offers the best product.
For our (very unscientific) blind test, we will be using excerpts of President Xi Jinping’s speech at the 2018 Bo’ao Forum. I chose this speech because if machine translation is to make any headway, it would start from the most formal (and, dare I say, most formulaic) official speeches. Furthermore, the Chinese government has provided an official translation of Xi’s speech, so we will have a benchmark to compare the machines against.
Before I reveal the test results, I should note that I had originally planned on using a Xi speech from 2017. But after feeding it to Google Translate, I got results that are identical to the official translation provided by the Chinese government. Hence, Google must have used the official translation as training material. To ensure the fairness of our test, I ran the Bo’ao speech on the various translation sites before the official translation came out in late April.
The results are as follows:
- First tier: Google, Microsoft Neural1, Sogou, Tencent (in no particular order)
- Second tier: NetEase Youdao, Baidu (in no particular order)
- Third tier: Microsoft Bing
Part 1 of the test results is shown below. Errors are marked in red and awkward phrasing in green.
Machine translation has come a long way. Only a few years ago was Google Translate struggling to come up with coherent sentences. Now it captures most, if not all of the main idea. As shown above, an English speaker would not have trouble following Xi’s speech in real time if he relied on only Google, Microsoft Neural, Sogou, or Tencent.
Another translation product DeepL, which has, in my opinion, outperformed Google in Spanish-English translations, has not made its Chinese-English service available yet. Thus, it is possible that DeepL may do an even better job with Xi’s speech.
Part 2 of the test results:
During the Bo’ao Forum, Tencent launched a massive PR campaign to promote its “AI solution to conference interpretation.” As seen below, the machine turned out to be more of a publicity stunt. Given Tencent’s decent translation product though, I suspect it was the Chinese-speech-to-text that went awry. Had the audio been faithfully transcribed, the “AI solution” might have provided a satisfactory English translation.
For now, machines can only be trusted for Chinese-English translations of technical manuals, official speeches and announcements, and serious news articles. Fiction or colloquial conversations would be a stretch. Simultaneous interpretation has the added problem of audio transcription – the noise of the room and the speaker’s dialect make things difficult for even the most experienced human interpreters.
Four Common Machine Mistakes
Below are four types of mistakes machines commonly made when translating excerpts of the Xi speech:
Where the Chinese is an extremely long sentence
- “坚决破除制约使市场在资源配置中起决定性作用、更好发挥政府作用的体制机制弊端” — if we break down this sentence, the main verb-object is “破除弊端”, but some machines think it’s “破除制约,使市场发挥作用…。机制弊端”; others think it’s “破除那些让市场…的制约.” This sentence is a challenge for both humans and machines
Where the Chinese omits previously-mentioned information
- “欢迎各国朋友来华参加” (preceded by a discussion of expos) = “friends from around the world are welcome to participate in the expo,” rather than “welcome friends from all countries to participate in China”
Where the Chinese is a word with multiple meanings (and the less common meaning is used)
- “(政策)落地” = “to materialize,” rather than “to land”
- “(行业)具备开放基础” = “to be in a position to open up”, rather than “to have an open foundation”
- “(同国际经贸规则)对接” = “to integrate” or “to align,” rather than “docking” (I think the machines got “docking” from “molecular docking”)
- “(完善产权制度)是经济竞争力的最大激励” = “provide the biggest boost to the competitiveness of the economy,” rather than “to be the biggest incentive to improve the competitiveness of the economy”
- “重新组建” = “to reorganize,” rather than “to re-establish”
- “(这不是)一般性的会展,(而是我们主动开放市场的重大政策宣示和行动)” = “another ordinary exhibition,” rather than “a general exhibition”
Omission
- “空气清新才能吸引更多外资” = “only fresh air attracts more foreign investment,” rather than “fresh air can attract more foreign capital” or “the air is fresh to attract more foreign capital” (Oddly, no machine got this straightforward sentence right)
-
Microsoft researchers recently developed this new translation system that they say has achieved “human parity” in translating from Chinese to English a test set of news stories. This new translation tool is available in Chinese-English only. ↩︎