| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tgv 4 days ago
	Idk which models you refer to, but I tested a bunch recently, and they performed well on Dutch. Only the smallest, such as qwen 3.6 27B, made up words and switched languages.

2 comments

numeri 4 days ago

There's a large gap between making up words and an actually native text distribution. LLMs have a clear pattern, clear tells, a "feel" in English, and it's normally even more pronounced in non-English languages.

Lots of bias towards English sentence structure, idioms, etiquette, etc.

link

tgv 1 day ago

I didn't notice any of that. Such a bias would be strange, because certainly smaller models don't have the luxury of learning grammar independently: it's still word sequences, and languages are quite well separated.

link

dvdkon 4 days ago

There would be a bunch of value in having, say, a good 30B-class model that used my local language as well as it does English. There's lots of cases, especially in the government sphere, where local processing is a requirement and frontier-level capabilities aren't required. Making those cheap to run seems like a fine goal.

link

throw310822 4 days ago

Can you provide some examples of these use cases?

link

bigfudge 4 days ago

Support bots and question answering with access to sensitive pii?

link

throw310822 4 days ago

Yes, but what's the point of a support bot that writes good Dutch when it can't follow instructions, doesn't understand the questions or can't solve problems? I might be wrong, but I don't think atm these models have the cognitive ability to perform any task in a satisfactory manner.

As for accessing pii, I imagine the value here is in the fact they're local, which has nothing to do with the "sovereignty" of these models. If anything, a model is more likely to be tricked by a malicious prompt the farther it is from the sota.

link

bigfudge 3 days ago

A good harness and engineering is important no matter which model you use. But Sovereignty of hosting is also important because without it all pii is being leaked.

link