Hacker News new | ask | show | jobs
by neosystem 1477 days ago
Really interesting.

I tried out "Hey, how are you? I don't understand how it can be so warm today." in my native language, and systran is the only one that got it 100% correct. Google was close, but reversed an article and a noun. The others mixed up "don't understand" with "don't know", which are similar but different enough to sound unnatural.

I've always thought that the best way to assess these systems is how they handle colloquial speech, stuff that we often take for granted but that's really quite strange to translate "literally". I bet even that phrase -- "take for granted" -- would be difficult to translate even though I'm certain most languages have a phrase for that exact sentiment.

1 comments

I've put this exact comment and asked for a Korean translation:

> 정말 흥미 롭습니다.

Almost correct, but "흥미롭습니다" (lit. be interesting with an implied to me) should be a single word.

> "이봐, 잘 지내? 오늘 어떻게 따뜻해질 수 있을지 모르겠습니다. "모국어로 systran이 100 %을 얻은 유일한 언어입니다. 구글은 가까웠지만 기사와 명사를 뒤집었다. 다른 사람들은 "알지 못한다"와 "알지 못한다"를 섞었다.

"I tried out" is completely missing, and "in my native language" is joined to the next sentence "and systran is...".

The quoted sentence is, when translated back to English, something like "Hey, are you going well? I don't know how [something] can get warm today." Like most other machine translators LingvaNex is clueless about Korean honorifics (the first sentence is informal while the second is formal here). It does get a colloquial Korean expression for "I don't understand" (lit. 이해가 안 된다) but doesn't get the dummy pronoun, so it somehow assumes an unspecified entity as a subject. The position of the closing quote is also off.

The first quasi-sentence after that quotation became something like "[it] is the only language that systran got 100 % in a native tongue". Even after ignoring the inconnectly joined "in a native tongue", the dummy pronoun seems a culprit again here where "the only one" got interpreted as a language, not systran.

The next sentence reads like "Google was nearby but swapped a post with a noun." A Korean word 가깝다 (lit. nearby) has a slightly different nuance from English "close" so it has to be paraphrased. LingvaNex interpreted an article as, uh, a newspaper article which is not a synonym in Korean (the correct word should be 관사). And it somehow also switched back to informal expressions.

The final sentence reads like "other people mixed 'don't know' and 'don't know'." This is kinda hilarious; LingvaNex actually understands both expressions are more or less equivalent in Korean but doesn't know when they have to be distinct.

> 나는 항상 이러한 시스템을 평가하는 가장 좋은 방법은 구어체 연설을 처리하는 방법, 우리가 종종 당연하게 여기는 것이지만 "문자 그대로"번역하는 것은 정말 이상하다고 생각했습니다". 나는 대부분의 언어가 그 정확한 감정에 대한 문구를 가지고 있다고 확신하지만 그 단어 ( "당연한 것으로 받아들이십시오")조차도 번역하기가 어려울 것입니다.

The first half is so hopelessly mangled that I can't give an English equivalent. I mean, each part is reasonably translated (including the phrase "stuff that we often take for granted" which translation is pretty much correct) but the wrong ordering messes everything up.

The second half is more reasonable: "I'm confident that most languages have phrases for the exact feeling but even that word ('take it to be natural') would be hard to translate." What is "the exact feeling" is unclear due to the reordering, and "take it granted" got translated too literally, but otherwise sounds fine.