|
|
|
|
|
by hgsgm
1207 days ago
|
|
ChatGPT doesn't well understand relationships between numbers. There are far too many of them, compared to words, since every slight perturbation of a number is a different valid number. (Also, I'm not sure if it even treated individua5 digits as separate tokens, but it might. Someone with API access can check.) To give it a fair shot, you need to describe the problem using logical conceptual vocabulary, not numbers. |
|
Anyone can check, they have a tool for that[1]. It's mentioned in their FAQ article[2].
According to their tool, GPT-3 counts the following as one token:
- any combination of or below 3 digits
- 1111, 3333, 6666, 9999 (it tends to group other digits in groups of 2)
- 66666666 (so 8 sixes -- 5, 6 or 7 won't work)
- 00000000 (anything below 8 zeros counts as one token as well, probably to handle millions and billions)
- 0000000000000000 (16 zeros)
This isn't an exhaustive list, there are probably a lot of other weird edge cases I haven't tried. Its failure to understand basic arithmetic makes much more sense given how inconsistent the tokenizing of digits is done.
[1]: https://platform.openai.com/tokenizer
[2]: https://help.openai.com/en/articles/4936856-what-are-tokens-...