Hacker News new | ask | show | jobs
by shuri 723 days ago
One theory I heard about this type of problem is because these algorithms tokenize the text early, and each token can be multiple characters.