Hacker News new | ask | show | jobs
by nialv7 81 days ago
Maybe this is why? Most of the training data has the single token version, so the three tokens version was undertrained?