| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by senectus1 11 days ago
	just curious.. are there languages that are better or more efficient to build LLM's with other than English?

1 comments

_jayhack_ 11 days ago

For some definitions of better, yes. Chinese is more token efficient for representing fixed text, for example, although this does not always lead to better performance on downstream tasks.

link

kmori_de 11 days ago

True. I suspect it's still hard to tell whether the bottleneck is the language itself, the tokenizer, or just the overwhelming amount of English training data.

link