Hacker News new | ask | show | jobs
by inigyou 22 hours ago
It means the word "the" as part of instructions and the word "the" as part of data would be two different tokens
1 comments

But tokens are just text! Isn't it all just text? If you're training and you encounter "the", is that an instruction "the" or a data "the"?