Hacker News new | ask | show | jobs
by taink 1207 days ago
> Also, I'm not sure if it even treated individual digits as separate tokens, but it might. Someone with API access can check.

Anyone can check, they have a tool for that[1]. It's mentioned in their FAQ article[2].

According to their tool, GPT-3 counts the following as one token:

- any combination of or below 3 digits

- 1111, 3333, 6666, 9999 (it tends to group other digits in groups of 2)

- 66666666 (so 8 sixes -- 5, 6 or 7 won't work)

- 00000000 (anything below 8 zeros counts as one token as well, probably to handle millions and billions)

- 0000000000000000 (16 zeros)

This isn't an exhaustive list, there are probably a lot of other weird edge cases I haven't tried. Its failure to understand basic arithmetic makes much more sense given how inconsistent the tokenizing of digits is done.

[1]: https://platform.openai.com/tokenizer

[2]: https://help.openai.com/en/articles/4936856-what-are-tokens-...