Hacker News new | ask | show | jobs
by danlitt 8 days ago
But tokens are just text! Isn't it all just text? If you're training and you encounter "the", is that an instruction "the" or a data "the"?
1 comments

If it occurs in the text box for instructions you encode it as an instruction "the" and if it occurs in the text box for data you encode it as a data "the"
Exactly!

Think of how an image of a car and a car in front of you may look indistinguishable in 2D -- but due to your 3D vision you know they're not the same thing (but also know the image is of a car, while not literally being a car).

Likewise, blue tokens are the image of red tokens.