|
|
|
|
|
by surround
72 days ago
|
|
I think you're right. Try asking GPT-5 this: > Are the parentheses in ((((()))))) balanced? There was a thread about this the other day [1]. It's the same issue as "count the r's in strawberry." Tokenization makes it hard to count characters. If you put that string into OpenAI's tokenizer, [2] this is how they are grouped: Token 1: (((( Token 2: ())) Token 3: ))) Which of course isn't at all how our minds would group them together in order to keep track of them. [1] https://news.ycombinator.com/item?id=47615876
[2] https://platform.openai.com/tokenizer |
|
Try to get your favourite LLM to read the time from a clock face. It'll fail ridiculously most of the time, and come up with all kinds of wonky reasons for the failures.
It can code things that it's seen the logic for before. That's not the same as counting. That's outputing what it's previously seen as proper code (and even then it often fails. Probably 'cos there's a lot of crap code out there)