My mental model is that if you give it real words, it uses approximately one token per word, and it may or may not know how many letters are in the word - it will have learned how many letters there are only if that information was in its training. Just like any other fact it learns about words. It is not counting the letters.
If you give it a gibberish word, it will represent it with one letter per token and be actually able to more or less count tokens in order to figure out how many letters there are.
So this ends up looking like it can count letters in most words, real and fake. Perhaps it would do poorly with real but uncommon words.
If you give it a gibberish word, it will represent it with one letter per token and be actually able to more or less count tokens in order to figure out how many letters there are.
So this ends up looking like it can count letters in most words, real and fake. Perhaps it would do poorly with real but uncommon words.