Hacker News new | ask | show | jobs
by Ambix 1105 days ago
Tokens are just integer numbers, showing their position in the big vocabulary - it's that simple :)

And vocabulary is just an array / vector / list - it depends which programming language you use, each has each own terminology for that data structure.

For example LLaMA vocabulary has 32,000 tokens.