Hacker News new | ask | show | jobs
by orbital-decay 242 days ago
That happens by default in low-resource languages, no bad translations needed. They don't have enough either written material to train an LLM, or labels for time periods and various dialects in a continuum. For example even the best multilanguage models will lump up all Berber languages into one unstable abomination nobody speaks, usually writing it in Neo-Tifinagh. Not much can be done about that, training a model in all varieties of these would require a huge specialized effort.
1 comments

And it's a lot more profitable to improve sex mode than to hire a small army of native speakers to make it not suck at Greenlandic.
What makes Greenlandic special among ~7000 languages in the world? Most of them are low-resource as well. To train a model in all of them you also need a ton of specialized linguists and ML people, neither of which grow on trees. And it's only one thing generalist models are supposed to master, out of many. The scale is impossible, this needs to be done by models themselves when (if) they get smart enough.