Hacker News new | ask | show | jobs
by bitRAKE 1218 days ago
The 1.4T tokens are what the model was trained on, and not the token range of the embedding.
1 comments

Ah, that makes more sense, thank you. Since this was mentioned in the tokenizer section and the number of unique tokens wasn't mentioned I misunderstood.