Hacker News new | ask | show | jobs
by kmori_de 11 days ago
True. I suspect it's still hard to tell whether the bottleneck is the language itself, the tokenizer, or just the overwhelming amount of English training data.