| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by orbital-decay 242 days ago
	That happens by default in low-resource languages, no bad translations needed. They don't have enough either written material to train an LLM, or labels for time periods and various dialects in a continuum. For example even the best multilanguage models will lump up all Berber languages into one unstable abomination nobody speaks, usually writing it in Neo-Tifinagh. Not much can be done about that, training a model in all varieties of these would require a huge specialized effort.

1 comments

1f60c 242 days ago

And it's a lot more profitable to improve sex mode than to hire a small army of native speakers to make it not suck at Greenlandic.

link

orbital-decay 242 days ago

What makes Greenlandic special among ~7000 languages in the world? Most of them are low-resource as well. To train a model in all of them you also need a ton of specialized linguists and ML people, neither of which grow on trees. And it's only one thing generalist models are supposed to master, out of many. The scale is impossible, this needs to be done by models themselves when (if) they get smart enough.

link