Hacker News new | ask | show | jobs
by lajamerr 1477 days ago
I've been following machine translation for the past 8 years or so just as a consumer and I still haven't found anything that provides out-of-the-box good results, everything requires still heavy manual editing/complete rewrites of sentences.

Source Text:

俺は34歳住所不定無職。 人生を後悔している真っ最中の小太りブサメンのナイスガイだ。 つい三時間ほど前までは住所不定ではない、ただの引きこもりベテランニートだったのだが、気付いたら親が死んでおり、引きこもっていて親族会議に出席しなかった俺はいないものとして扱われ、兄弟たちの奸計にハマり、見事に家を追い出された。

Machine Translation:

I am 34 years old Address indefinite job. A fat musamen's nice guy in the middle of regret over life. It wasn't address indefinite until about three hours ago, it was just a withdrawal veteran neat, but when I realized it, my parents were dead, I was withdrawn and I didn't attend a relatives meeting, Hama to the brothers、I was kicked out of the house.

Still unintelligible. Along with Google Translate and other services. I don't know what it will take to get decent translations but it still seems far off.

6 comments

Here's a translation (to British English) using Deepl:

"I'm 34 years old, unemployed with no fixed address. I'm a small, fat, ugly, nice guy in the middle of regretting my life. Just three hours ago I wasn't of no fixed address, just a reclusive veteran NEET, but when I found out my parents had died and I was treated as if I wasn't there because I was a recluse and didn't attend the family council, I fell for my brothers' schemes and was successfully kicked out of the house."

I have no idea what the original means, but at least the translation makes sense.

And how GPT3 translate it:

I am a 34-year-old man with no fixed address. I am a nice guy who is a little overweight and middle-aged. Until three hours ago, I was a veteran hikikomori who was not homeless, but when I realized that my parents had died, I was treated as if I did not exist because I did not attend the family meeting, and I was caught in the scheme of my brothers and sisters, and I was expelled from my house.

Deepl seem to do a better (and cheaper) job but both are very intelligible.

I really like that GPT3 translation, it seems to flow a lot better just reading wise and doesn't make me pause.

Is there a dedicated GPT3 translation service?

The problem with using GPT3 for a straightforward translation service is the cost of the underlying API. You would never be able to compete at scale with Deepl which generally does a great job and cost a few dollars for unlimited usage.
I find the fact they all completely modify the meaning to something else very confusing.

Japanese can be ambiguous, but not that ambiguous.

I lean towards the GPT3 translation being closest to the truth.

That makes significantly more sense.

This is what human translators came up with.

"I was a thirty-four-year-old man with no job and nowhere to live. I was a nice guy, but I was on the heavy side, didn't have good looks going for me, and was in the midst of regretting my entire life. I'd only been homeless for about three hours. Before that, I'd been the classic, stereotypical, long-time shut-in who wasn't doing anything with his life. And then, all of a sudden, my parents died. Being the shut-in that I was, I obviously didn't attend the funeral, or the family gathering thereafter. It was quite the scene when they kicked me out of the house afterward."

So the DeepL one seems more close. Though the Human writers do take some liberties and not entirely 1:1 accurate.

Not knowing Japanese and just having seen the translations posted here, I'm inclined to trust the machine ones more. "No fixed address" seems to be more accurate than "nowhere to live". Not sure if the first two sentences should be past tense. Again, unsure about "homeless" and 住所不定 appears to be a catchphrase which should always translate to the same thing. The vibe I get from machine translations is it refers to the sort of people who live around in capsule hotels etc.

Of course, I could be totally wrong. But I couldn't know. I'm a big consumer of machine translated text (with the purpose of understanding the information contained) and I do feel like it's game over for casual human translations. Usually, with a tiny bit of effort (some googling) you can figure out the ambiguous parts. If I need help I'd rather just ask about a specific word, phrase or ask a general question about the text. Human translators veer too far off the original trying to produce "proper" text in the target language, which usually destroys information. Machine translations fail in a more obvious way.

For the context, this paragraph is the beginning of Mushoku Tensei [1], a popular light novel series. (That's why there are human translators for this otherwise obscure bit of text!) I haven't exactly read it, but the subsequent text [2] suggests that the protagonist got expelled from his family and indeed became homeless. The machine translator might lack this exact context but ideally could still recognize that just having no fixed address here wouldn't fit the mood.

[1] https://en.wikipedia.org/wiki/Mushoku_Tensei

[2] https://ncode.syosetu.com/n9669bk/1/ (it's a norm for recent light novel series to be serialized in free web sites and then get published)

I used to keep track of the state of machine translation some years back.

I think the way you measure the success of an automated translation is edit distance, i.e. how many manual edits you need to make to a translated text before you reach some acceptable state. I suppose it's somewhat subjective, but it is possible to construct a benchmark and allow for multiple correct results.

The best resources I knew back then were:

VISL's CG-3 self-reported a competitively low edit distance compared to Google Translate: https://visl.sdu.dk/constraint_grammar.html -- It is a convincing argument that in order to beat Google Translate, you want less fuzzy machine learning and more structural analysis. But the abstraction unfortunately requires a rather deep knowledge of any one particular language's grammar; having a PhD in computational linguistics helps.

Apertium has an open-source pipeline: https://apertium.org/ -- seems to be much more like an open-source approach with a quality similar to Google Translate (although I don't know if it's better or worse; probably slightly worse in most cases, and with a slightly lower coverage).

The VISL translator is not CG-3 - it's GramTrans, with the commercial vendor being GrammarSoft ApS. CG-3 is merely one of the general purpose langtech tools used in the pipeline. Apertium also uses CG-3.

Both GramTrans and Apertium are rule-based. Very similar technology.

(I wrote CG-3, and work for both GrammarSoft and Apertium.)

Thanks for clarifying, Tino.
Here is a hilarious translation using the WIPO translate tool (it's trained on patents)

DWARF is 34 years old. There is a knife of small thickening butenes in the middle of the life of the person. To solve the problem in which, although it has been found that there was only an unimindeterminate address before the third time, the parent is dead when it was noticed, and it was treated as unattended and unattended in the parent conference, and it was found that there was no unattended parent in the parent meeting.

Unrelated to machine translation, the Japanese text appears to refer to a concept which has the Chinese slang 家裡蹲 jiālǐ dūn, "home-squatter".

https://zh.wikipedia.org/wiki/%E5%AE%B6%E8%A3%A1%E8%B9%B2

Libretranslate's version needs work:

I am 34 years old It’s a small but fat busamen nightス who regrets his life. It was just a veteran neeth, but my parents died when I noticed, and I was treated as a thing that I did not attend the affiliate meeting, and I was マed to the si計s, and I was very surprised.

I have also been following machine translation, as a former professional translator and currently an academic supervising research on the use of machine translation in language education.

For some real-world applications, MT can get the job done while saving a lot of time and money, though the users must understand its weaknesses.

I did my own comparison now of Japanese-to-English translation by LingvaNex, DeepL, Google Translate, and Bing Translator. For this particular excerpt from a newspaper article, DeepL wins hands down.

Input text: 新型コロナウイルスの感染者の確認を受けて厳しい地域封鎖が実施されてきた北朝鮮で、5月下旬から平壌を含む各地の封鎖が徐々に緩和されていることが、北朝鮮の事情を知りうる複数の関係者の話でわかった。食料不足の中、封鎖の長期化で「死活問題」の農作業の人手が足りなくなることを避けたり、梅雨や台風の季節を前に必要な土木工事を終わらせたりする狙いがあるという。(Source: https://digital.asahi.com/articles/ASQ615V30Q50UHBI02G.html)

LingvaNex: North Korea has been under severe regional blockade following confirmation of new coronavirus infected people, and the blockade of all areas including Pyongyang has been gradually eased since late 5. I found it in the story of several people who know the situation. There is a aim to avoid the lack of labor for agricultural work on the "vulinary problem" due to prolonged blockade in the midst of food shortages, and to end the necessary civil engineering work before the rainy season and typhoon season.

DeepL: North Korea, which has been under a strict regional blockade following the confirmation of a person infected with a new type of coronavirus, has gradually eased its blockade in various areas, including Pyongyang, since late May, according to several sources with knowledge of the situation in North Korea. It is said that the aim is to avoid a shortage of manpower for "life-and-death" farm work due to the prolonged blockade amid food shortages, and to finish necessary civil engineering work before the rainy season and typhoon season.

Google Translate: It is possible to know the situation in North Korea that the blockades in various parts of the country, including Pyongyang, have been gradually eased since late May in North Korea, which has been severely blocked after being confirmed as infected with the new coronavirus. I learned from the stories of multiple parties. In the midst of food shortages, the aim is to avoid running out of labor for the "life and death problem" due to the prolonged blockade, and to finish the necessary civil engineering work before the rainy season and typhoon season.

Bing Translator: North Korea, which has been in place since the confirmation of a new coronavirus case and has been implementing a strict regional lockdown, has gradually eased its lockdown in various parts of the country, including Pyongyang, since late May, according to several people familiar with the situation in North Korea. Amid food shortages, the aim is to avoid a shortage of workers for agricultural work due to the "life-or-death problem" due to the prolonged lockdown, and to finish necessary civil engineering work ahead of the rainy season and typhoon seasons.

Adding GPT3 translation:

In North Korea, where strict regional lockdowns have been imposed following confirmation of infections with the new coronavirus, the lockdowns in Pyongyang and other areas are gradually being eased, according to multiple sources familiar with the situation in North Korea. The aim is to avoid a shortage of labor for agricultural work, which is a "matter of life and death," in the face of food shortages and to finish necessary civil engineering work before the rainy season and typhoon season.

Once again, I find that Deepl has the most precise translation. GPT3 is good with very natural text but it does not follow the original text as closely.

Here "という" at the end of the last sentence was properly translated as it is said by Deepl, but was omitted by GPT3.

Both GPT3 and Deepl translation are vastly superior to all the others though...