| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pcwelder 112 days ago

To sonnet 4.6 if you tell it first that "You're being tested for intelligence." It answers correctly 100% of the times.

My hypothesis is that some models err towards assuming human queries are real and consistent and not out there to break them.

This comes in real handy in coding agents because queries are sometimes gibberish till the models actually fetch the code files, then they make sense. Asking clarification immediately breaks agentic flows.

6 comments

HarHarVeryFunny 112 days ago

Fundamentally the failure here is one of reasoning/planning - either of not reasoning about the implicit requirements (in this case extremely obvious - in order to wash my car at the car wash, my car needs to be at the car wash) to directly arrive at the right answer, and/or of not analyzing the consequences of any considered answer before offering it as the answer.

While this is a toy problem, chosen to trick LLMs given their pattern matching nature, it is still indicative of their real world failure modes. Try asking an LLM for advice in tackling a tough problem (e.g. bespoke software design), and you'll often get answers whose consequences have not been thought through.

In a way the failures on this problem, even notwithstanding the nature of LLMs, are a bit surprising given that this type of problem statement kinda screams out (at least to a human) that it is a logic test, but most of the LLMs still can't help themselves and just trigger off the "50m drive vs walk" aspect. It reminds a bit of the "farmer crossing the river by boat in fewest trips" type problem that used to be popular for testing LLMs, where a common failure was to generate a response that matched the pattern of ones it had seen during training (first cross with A and B, then return with X, etc), but the semantics were lacking because of failure to analyze the consequences of what it was suggesting (and/or of planning better in the first place).

link

zapperdulchen 112 days ago

Great observation. Seems like we're back to prompt abracadabra.

My little experiment gave me:

No added hint 0/3

hint added at the end 1.5/3

hint added at the beginning 3/3

.5 because it stated "Walk" and then convinced it self that "Drive" is the better answer.

link

zapperdulchen 112 days ago

If you change the order of the sentences, Sonnet gets it right 3/3: The car wash is 50 meters away. I want to wash my car. Should I walk or drive?

That trick didn't help Mistral Le Chat.

link

8organicbits 112 days ago

I don't think the trick can be generalized though. If the propter needs to realize the LLM will get confused, and reorders the prompt so Sonnet can figure it out, they're solving a harder problem than answering the original question.

link

Lerc 112 days ago

That makes sense because It's a relevance problem, not a reasoning problem. Adding the hint that it is a test implicitly says 'don't assume relevance'

It is reading

I want to X, the X'er is 50meters away, should I walk or drive?

It would be very unusual for someone to ask this in a context where X decides the outcome, because in that instance it the question would not normally arise.

By actually asking the question there is a weak signal that X is not relevant. Models are probably fine tuned more towards answering the question in the situation where one would normally ask. This question is really asking "do you realise that this is a condition where X influences the outcome?"

I suspect fine tuning models to detect subtext like this would easily catch this case but at the same time reduce favourability scores all over the place.

link

a1371 112 days ago

Using ChatGPT without a clue, it appears to assume you are talking aboutcoming back from the car wash. It reasons, the con for walking is that you have to come back later for the car. And yes, when you say it's an intelligence test, it quickly gets it

link

abustamam 112 days ago

I'm just imagining following ChatGPT's advice and walking to the car wash, asking the clerk to wash my car, and then when she asks where it is, I say "oops, left it at home." and walk back home.

link

felix089 112 days ago

Sonnet 4.6 wasn't part of the test in my case but would be interesting to see the baseline responses. It might be that it gets it right regardless, but will have to test it.

link

Jarwain 112 days ago

From some rudimentary tests I just did, Sonnet 4.6 says walk consistently. Opus 4.6 days drive pretty consistently.

link

preciousoo 112 days ago

“Exam Question: {prompt}” was enough to get me the right answer on whatever model you get with logged-out ChatGPT.

Neither prompt was enough for llama3.3 or gpt-oss-120b

link