Hacker News new | ask | show | jobs
by energy123 368 days ago
Yes, and an efficient tokenizer designed only for that language. As the ratio of synthetic data to human data grows this will become more plausible.