Hacker News new | ask | show | jobs
by azulster 631 days ago
yes, you are missing that the tokens aren't words, they are 2-3 letter groups, or any number of arbitrary sizes depending on the model
1 comments

Nope, I'm not missing that particular fact. I'm aware that sentences (and words) are split into tokens, which are vectors.

I don't understand how most LLMs can spell out words though, nor do I understand what is causing the failure to count characters in words. I was not convinced by the comment I was responding to.