Hacker News new | ask | show | jobs
by delidumrul 2868 days ago
Many mistranslations may be a result of lack of work/research in Turkish language. I don't agree with that Turkish language does not get along with computers well. Yet, there is not a mature work on this. If a language is understandable, analyzable, executable, expressible by humans, computers most probably can process it (this is a claim, not a fact).
2 comments

> If a language is understandable, analyzable, executable, expressible by humans, computers most probably can process it (this is a claim, not a fact).

In English, a sentence[1] is "a set of words that is complete in itself, typically containing a subject and predicate, conveying a statement, question, exclamation, or command, ...".

In Turkish, sometimes you need a paragraph, sometimes you need a page, and sometimes you need the whole story to provide "a set of words that is complete in itself".

Try these for size[2,3].

I especially love the translations of the word "tane". Or "say" here[4].

I am going to stop before I keel over. Sure, you can try to codify a rule for each and every "set of words that is complete in itself", but I doubt that set is finite.

The way things are going I full expect to see people being required to use Turkish in a way computers can understand before computers can deal with Turkish.

[1]: https://www.google.com/search?q=define%3Asentence

[2]: https://translate.google.com/#tr/en/%C3%A7akmak%20%C3%A7akma...

[3]: https://translate.google.com/#tr/en/%C3%A7ak%20bir%20tane

[4]: https://translate.google.com/#tr/en/say%20ba%C5%9Ftan

I feel like you are mixing multiple things together, and giving examples from google translate doesn't really support your argument - it just shows google translate sucks for turkish language.
English is not immune to ambiguities and thus not superior to Turkish in this regard:

https://translate.google.com/#en/tr/fruit%20flies%20like%20a...

One approach to test the effectiveness of a translator is translating a passage from L1 to L2 and then its reverse. When you do this action many times, if the final version of the passage in L1 gives the meaning of the original passage in L1; then it is a stable translator. They do this test for English-German at Linguee translator. Their claim is that it is more successful than Google translate in these terms. This gives another idea that Google Translate does not give strong results for English-German translations. Considering German is much more closer language to English than Turkish, I don't expect much from Google Translate. Thus, I don't see it as a proof to any argument.
On Google translate, Dolar düşsün → Dollar dream → Dolar rüyası.
> not superior to Turkish in this regard

Computers cannot deal with the ambiguities of Turkish. That says nothing about English being superior to Turkish.

However, not having a central controlling authority and not having gone through censorship drives, English does have a much richer vocabulary in common daily use to more specifically express various subtle distinctions whereas a Turkish speaker has to rely on context that is not embedded in the grammatical structure and specific words used.

These problems and not specific to Turkish. All languages pose different challenges, with focused research almost all are solvable.
I agree with you, it is true that from computer linguistics perspective Turkish is more complex in general, but it is indeed possible to tackle all of these issues it given enough time and energy.

Lately there are bigger advances in the area (e.g. https://arxiv.org/pdf/1805.07946.pdf)