Hacker News new | ask | show | jobs
by senectus1 11 days ago
just curious.. are there languages that are better or more efficient to build LLM's with other than English?
1 comments

For some definitions of better, yes. Chinese is more token efficient for representing fixed text, for example, although this does not always lead to better performance on downstream tasks.
True. I suspect it's still hard to tell whether the bottleneck is the language itself, the tokenizer, or just the overwhelming amount of English training data.