Hacker News new | ask | show | jobs
by stonewhite 2670 days ago
Turkish being a member of ural-altaic language family, just like Finnish, suffers from similar problems. Google Translate improved much over the years yet it still generates laughable text at best. Stemmers used to create show-stopper word stems (don't really know the current situation).

Although, it imported a lot of technical terms from various european languages, making technical texts seem to be more legible due to lack of composited words in these contexts.

similarly from tweets: yiyecek = food yiyecek miydi = will he/she eat that (2nd word is not a separate word, just a conjunction that is written separately)

göz = eye gözcü = scout gözlük = glasses gözlükçü = glass salesman gözcülük = scouting gözlükçüler = glass salesmen gözlükçüydüler = they were glass salesmen gözlükçü müydüler = were they glass salesmen?

also an all time classic: çekoslavakyalılaştıramadıklarımızdan mısınız = are you one of those people whom we tried unsuccessfully to assimilate into a Czechoslovakian citizen?

1 comments

Ural-Altaic language family is today considered an obsolete concept [1] and the families are considered unrelated

[1] https://en.wikipedia.org/wiki/Ural%E2%80%93Altaic_languages

The models / software created for one language mostly fit others. Eg. Two-level finite state morphology was invented for Finnish and was very successfully adopted to parse Turkish words.

So the opinions of Linguists aside, the languages that make up the ural-altaic family are not that far apart from each other.

Yes,but both languages share some important structural similarities. Rich morphology, extreme agglutination and vowel harmony.