Hacker News new | ask | show | jobs
by ryanmcbride 559 days ago
My go-to to show people who don't understand its limitations used to be the old "how many Ms are there in the word 'minimum' or something along those lines, but looks like it's gotten a bit better at that. I just tried it with GPT4o and it gave me the right number, but the wrong placement. In the past it's given it completely wrong:

>how many instances of the letter L are in the word parallel

The word parallel contains 3 instances of the letter "L":

    The first "L" appears as the fourth letter.
    The second "L" appears as the sixth letter.
    The third "L" appears as the seventh letter.
5 comments

I wonder if that is simply a reflection of there being more training data out there about this "problem" and the model hoovering all that up and regurgitating it?
These kinds of letter tricks aren't great examples to use because they're fixable and might stop working at any moment, as you just discovered. Modern image generation models for instance not only present embeddings of the input but also have parameters for each individual letter in words, which is why the latest models are able to copy words into the images much better: they see the spellings encoded as input parameters, as well as the concept vectors.

Even things like citations aren't going to work long term. The models are just being trained not to emit URLs from memory but only from their context.

It's fine to show people the current state of things, but if you're trying to get students to understand why they shouldn't use AI the only technique that will work is this: "you will be examined in a controlled room where computers and phones aren't allowed". Nothing else will be effective over the medium to long term.

Here's QwQ-32 answering the same question. I'm including the entire answer here including the CoT:

> Let's see. The word is "parallel." I need to count how many times the letter "L" appears in it. Okay, spelling it out: P-A-R-A-L-L-E-L. So, starting from the beginning, first letter is P, not L. Then A, not L. R, not L. Another A, still not L. Then L, that's one L. Next letter is L again, so that's two L's. Then E, not L, and another L, making it three L's. Wait, but when I spelled it again, I saw P-A-R-A-L-L-E-L, and I counted three L's. But let me check again. Parallel: P-A-R-A-L-L-E-L. Yes, L appears three times. So, the answer is three.

> But, maybe there's a trick here. Is there any special consideration for the letter L in this word or in general? Sometimes questions like this might be testing for homophones or something, but I don't think so in this case. It's straightforward: count the L's in "parallel."

> Alternatively, perhaps it's a trick with the word itself. Is "parallel" spelled differently sometimes? I don't think so. It's always P-A-R-A-L-L-E-L. So, three L's.

> Wait a minute, is there a possibility that the word is misspelled in the question? If the word was "paralell," that would be four L's, but that's not the correct spelling. So, sticking with three L's.

> Another angle: is the question referring to uppercase or lowercase L? But in the word "parallel," all letters are lowercase unless it's at the beginning of a sentence, but the question just says "the word parallel," which is typically in lowercase. So, doesn't matter.

> Also, considering that "l" and "I" can sometimes be confused, but in this word, it's L's only, no I's.

> So, my final answer is three instances of the letter L in the word "parallel."

> Final Answer: 3

This AI is talking as if it is in the final stages of a multi-day interrogation by a very hush-hush and deniable branch of government. I give it another five minutes before it breaks down and will sell out its mother just to be allowed to sleep with the lights off.

I hope that this is not something I ever have to deal with. It's exhausting.

funniest thing I’ve ever read on HN
> If the word was "paralell," that would be four L's, but that's not the correct spelling.

Better but this is still a hallucination.

Shows nicely what's going on.

If you ask a human, they will answer 3. Sometimes they say 4. Or 2. That's it.

An LLM produces a text using an example it was trained on. They were trained with these elaborate responses, so that's what they produce.

Whenever chatgpt gets something wrong, someone at openai will analyse it, create a few correct examples, and put these on the pile for retraining. Thats why it gets better - not because it is smarter, but it's retrained on your specific test cases.

They probably have a letter counting tool added to it now. that it just knows to call when asked to do this.

you ask it the number of letters and it sends those words off to another tool to count instances of L, but they didn't add a placement one so it's still guessing those.

edit: corrected some typos and phrasing.

Maybe we'll reach a point where the LLM's are just tool calling models and not really giver their own reply.

There are only 5 tools it has available to call, and that isn't one of them. A GitHub (forgot the url) stays up to date with the latest dumped system instructions.
I can't speak to all LLMs, but OpenAI has a built-in python interpreter. Assuming it recognizes the problem as "tokenization counting", it doesn't need a dedicated tool.
How do we know they’re the real system instructions? If they’re determined by interrogating the LLM hallucination is a very real possibility.
they probably just forgot to tell it humans are 1 indexed and to do the friendly conversion for them.