Y
Hacker News
new
|
ask
|
show
|
jobs
by
Tiberium
85 days ago
It's a single token in the most common usage, that is, with a space in front of it
"This word is geschniegelt" is [2500, 2195, 382, 192786]
Last token here is " geschniegelt"
1 comments
nialv7
85 days ago
Maybe this is why? Most of the training data has the single token version, so the three tokens version was undertrained?
link