In my last piece for Unseen Japan, I wrote about the Tokyo activists who want to cancel the Olympics. If we wanted to make a machine translation via Google Translate, the machine translation comes out half-decent:
Polling shows that more than half of Tokyo residents believe that the Games should be further postponed or canceled altogether.
The translation reads not-quite naturally in Japanese, kind of like an advanced but far from fluent Japanese learner. It forgets that Games refers to the Olympic Games and selects several unnatural word choices. But overall,it sounds like a Japanese sentence.
Translation of a Japanese news article on the exact same subject into English performs a bit worse, though not disastrously:
In a poll conducted by NHK in July this year, “should be postponed further” (35%) and “should be canceled” (31%) far exceeded “should be held” (26%).
Accurate but very clumsy. The English translation sounds quite professional at first. But the program can’t figure out what to do with the lack of subjects and objects in the Japanese. So ultimately, it delivers a semi-incomplete sentence.
Things get quite a bit worse if you feed Google Translate anything less than straightforward. Here’s a machine translation of the first paragraph of Miyazawa Kenji’s Night on the Galactic Railroad:
“Then, do you all know what this vaguely white thing, which was said to be a river like that, or after the milk flowed, really is?”
The English, desperately cleaving to the sentence structure of the Japanese, comes out practically nonsensical. Google Translate actually remains practically useless for Japanese in terms of accurate translation, at least for anything beyond the simplest of constructions.
Even worse, what about this Japanese translation of one of the first sentences of Moby-Dick?
The pale Usher—threadbare in coat, heart, body, and brain; I see him now.
There are almost too many bizarre things to count for such a short sentence, including misuse of a dash and misinterpretations of the words “Usher,” “heart,” “brain,” “threadbare,” and even “see.”
But machine translation has come a long way in recent years, with impressive advances in AI and neural machine translation. But what use, if any, does machine translation actually have for a language like Japanese?
Advances in machine translationGoogle Translate actually remains practically useless for Japanese in terms of accurate translation, at least for anything beyond the simplest of constructions. Click To Tweet
At a very basic level, machine translation uses algorithms to substitute words in one language for another. For obvious reasons, this doesn’t work very well, so more advanced algorithms attempt to substitute bigger phrases and even full sentences based on context.
In recent years, neural machine translation has come to prominence, which uses a deep learning type of AI called a neural network. Neural networks can look at whole sentences instead of individual words, require a fraction of the memory of traditional software, and work far faster.
Google, Facebook, and most large tech companies have adopted neural machine translations, which have resulted in vastly improved translations from just a few years ago. The first scientific paper describing a neural network in use for machine translation was published as recently as 2014. In other words, this is a relatively new field, considering that machine translation in some form has been in development since before World War II.
Neural machine translations do a much better job at creating natural-sounding translations. But even neural machine translations can be offensively wrong and even completely unintelligible.
“At present, machine translation does not rival a good mother tongue linguist, but there still is a place for AI and Machine Translation in the translation services value chain,” said William Mamane, Head of Digital Marketing at Tomedes, a professional language services agency, told ReadWrite.com.
The Machines Have Their Work Cut Out for Them
Machine translation is being deployed in several industries, primarily by language service providers. But just a tiny percentage of them use machine translations based on AI or neural networks and do so for a negligible portion of their actual work. Neural machine translation remains in a primarily developmental and experimental stage, although it undoubtedly has a place in the current translation market.
But capacity aside, Japanese and English are notoriously distant languages. Do the same trends even apply to Japanese?“DeepL sticks to the original Japanese text which results in English that does not sound natural and is not very engaging for readers,” the analysis states. Click To Tweet
Trans-Euro, a translation agency, recently took an in-depth look at the Japanese version of DeepL, a neural machine translation tool praised as one of the best available. They describe the results as “understandable most of the time” while stylistically being “quite different” from a human translator. “DeepL sticks to the original Japanese text which results in English that does not sound natural and is not very engaging for readers,” the analysis states.
This is an impressive accomplishment for a neural machine translation tool, especially given the results by Google Translate above. However, “understandable most of the time” is embarrassingly far away from a competent product. And given that DeepL still had a number of misinterpretations and factual errors, the output would be useless for a company wanting to produce an accurate translation of a Japanese text. That would require also having a human translator on board to thoroughly check and revise.
As a translator of Japanese light novels and manga, I’ve found that many (or even most) sentences in Japanese need to be completely reimagined from scratch. Scrap word-order, ignore dictionary definitions, and add a lot of context that tends to be only implicit in Japanese but is grammatically necessary for English—and that’s just the beginning.
Word substitution, phrase substitution, and even sentence substitution is not guaranteed to work in a language as different from English as Japanese. “The varying emphases between Japanese and English… also occur at the macro-level of larger stretches of discourse,” writes Judy Wakabayashi in her book Japanese-English Translation: An Advanced Guide. “How Japanese writers combine sentences, organise paragraphs and structure whole texts does not always match English practices.”
The Japanese-English Gulf
Most linguists and translators agree that Japanese is one of the hardest languages to translate into English. So why is Japanese so hard for a computer to translate?
There are a lot of reasons.
One of the important ones is that Japanese has Subject-Object-Verb sentence order, while English uses Subject-Verb-Object. That means that the order of clauses in a sentence always needs to be fundamentally swapped when translating between English and Japanese. Japanese doesn’t even require a subject or object to create a grammatical sentence, so when translating into English, a machine would need to infer the correct subject and object by context—extremely difficult for a machine.
Other grammatical issues include word ending inflections like よ and わ in Japanese, which transform the tone of the sentence despite having no explicit meaning, Japanese’s variety of counters, the lack of spaces between words, and many more.
That doesn’t even begin to enter the realm of issues above the grammatical level, as implied by Wakabayashi. There is Japanese keigo, or honorary/humble expressions that take into account social relationships, and other social concepts and norms that do not exist outside of Japan.
Takako Aikawa at the Massachusetts Institute of Technology did a 2018 study reviewing neural machine translation’s improvements and remaining challenges with Japanese. Major improvements included machines getting better at translating Japanese counters and context-dependent verbs, like する, which can have dozens of different meanings depending on context.
Major issues included pronouns, tag-questions (sentences that end in a negative), and of course, keigo.
“None of the machine translation outputs included irregular honorific and humble forms [of keigo],” Aikawa wrote. “The lack of humble or honorific forms simply make these sentences less fluent or non-natural.”
The impressive point here is that machine translations are struggling with the same components of Japanese and English that language learners do. But advancements in terms of sentence-to-sentence accuracy doesn’t even begin to approach discourse-level issues.
Conclusion: Can AI Translate Japanese in 2020? In 2030?As a translator of Japanese light novels and manga, I’ve found that many (or even most) sentences in Japanese need to be completely reimagined from scratch. Click To Tweet
Neural machine translations suffer from a few key issues, including biased data. Since neural networks learn from real-world patterns, they make mistakes according to the data sample. For example, machine translations frequently inserted the pronoun “her” into a translation into English about nurses when no gender was specified in the original language—simply because the machine was trained on data that tends to refer to nurses as she.
“NMT is still far from reliable,” writes Sharon Zhou, a Stanford Computer Science Ph.D. studying AI and machine learning, for Skynet Today. “We can translate single sentences, but not longer pieces of texts. We can translate well enough for humans to get the gist, but not reliably enough for applications where accuracy is crucial, and not artistically enough for applications where elegance is important.”
Machine translation of Japanese is better than it’s ever been, but still practically useless for any field that requires an accurate translation. If inaccuracy is acceptable, then machine translation has a multitude of applications. Automatic translation of websites and even pocket translators that allow businesses to communicate with tourists are abundant in 2020 and have real use.
But with this perspective in mind, the increasing proliferation of machine translations is concerning. Neural machine translation is fundamentally error-prone. But because they do a convincing job, and do it instantaneously, they are oh-so-tempting.
In conclusion: AI currently cannot actually translate Japanese. It can fake it, but the current results even from DeepL wouldn’t come close to passing a translation test.
“Especially for online content we cannot recommend [DeepL], as writing style in Japanese and English can be quite different,” Trans-Euro’s report concluded.
But will AI ever be able to translate Japanese?
In a sense, yes, and in another sense, no. Machine translation of Japanese will continue to get much better. As a result, human translators will increasingly work alongside machine translations. Machine translations will speed up the work of human translators by taking care of simple language issues and allow them to focus on more complex challenges.
Some industry insiders have an extremely optimistic view. “I would estimate 80% of the material that corporate customers pay to have translated on the market today, it will be machine translatable in the next one to three years,” One Hour Translation CEO Ofer Shoshan told Forbes in 2018. Certainly that hasn’t come to fruition with Japanese yet.
Whether it will—at least in the medium-term—really depends on whether or not corporations and clients want accurate translations. If they do, they’ll will almost certainly continue to use human translators far into the future.
A future of actual collaboration between humans and machines is on the table. Does that mean AI will be able to truly translate Japanese? Not necessarily, but it has a fighting shot.