Hacker News new | ask | show | jobs
by isaacfung 979 days ago
I think that is partly why LLMs are bad at math and often fail at counting subsequences. Play with the tokenizer and you see long numbers are split into groups of 2 or 3 numbers.

https://huggingface.co/spaces/Xenova/the-tokenizer-playgroun...