Y
Hacker News
new
|
ask
|
show
|
jobs
by
shuri
723 days ago
One theory I heard about this type of problem is because these algorithms tokenize the text early, and each token can be multiple characters.