Hacker News new | ask | show | jobs
by surround 72 days ago
I think you're right. Try asking GPT-5 this:

> Are the parentheses in ((((()))))) balanced?

There was a thread about this the other day [1]. It's the same issue as "count the r's in strawberry." Tokenization makes it hard to count characters. If you put that string into OpenAI's tokenizer, [2] this is how they are grouped:

Token 1: ((((

Token 2: ()))

Token 3: )))

Which of course isn't at all how our minds would group them together in order to keep track of them.

[1] https://news.ycombinator.com/item?id=47615876 [2] https://platform.openai.com/tokenizer

3 comments

This is mostly because people wrongly assume that LLMs can count things. Just because it looks like it can, doesn't mean it is.

Try to get your favourite LLM to read the time from a clock face. It'll fail ridiculously most of the time, and come up with all kinds of wonky reasons for the failures.

It can code things that it's seen the logic for before. That's not the same as counting. That's outputing what it's previously seen as proper code (and even then it often fails. Probably 'cos there's a lot of crap code out there)

Don’t ask the LLM to do that directly: ask it to write a program to answer the question, then have it run the program. It works much better that way.
But for lisp, a more complex solution is needed. It's easy for a human lisp programmer to keep track of which closing parentheses corresponds to which opening parentheses because the editor highlights parentheses pairs as they are typed. How can we give an LLM that kind of feedback as it generates code?
That's a different question than the one you asked. Are you saying LLMs are generating invalid LISP due to paren mismatching?
That's what the comment I was originally replying to was saying.
If the LLM is intelligent, why can’t it figure out on its own that it needs to write a program?
The answer is self-evident.
does the ai performance drop if it uses letters for tokens rather than tokens for tokens?
Try asking an LLM a question like "H o w T o P r o g r a m I n R u s t ?" - each letter, separated by spaces, will be its own token, and the model will understand just fine. The issue is that computational cost scales quadratically with the number of tokens, so processing "h e l l o" is much more expensive than "hello". "hello" has meaning, "h" has no meaning by itself. The model has to waste a lot of computation forming words from the letters.

Our brains also process text entire words at a time, not letter-by-letter. The difference is that our brains are much more flexible than a tokenizer, and we can easily switch to letter-by-letter reading when needed, such as when we encounter an unfamiliar word.