Y
Hacker News
new
|
ask
|
show
|
jobs
by
nialv7
81 days ago
Maybe this is why? Most of the training data has the single token version, so the three tokens version was undertrained?