| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by porridgeraisin 508 days ago

I don't think all this is needed to prove that LLMs aren't there yet.

Here is a simple trivial one:

"make ssh-keygen output decrypted version of a private key to another file"

I'm pretty sure everyone on the LLM hypetrain will agree that just that prompt should be enough for GPT-4o to give a correct command. After all, it's SSH.

However, here is the output command:

  ssh-keygen -p -f original_key -P "current_passphrase" -N "" -m PEM -q -C "decrypted key output" > decrypted_key
  chmod 600 decrypted_key

Even the basic fact that ssh-keygen is an in-place tool and does not write data to stdout is not captured strongly enough in the representation for it to be activated with this prompt. Thus, it also overwrites the existing key, and your decrypted_key file will contain "your identification has been saved with the new passphrase", lol.

Maybe we should set up a cron job - sorry, chatgpt task - to auto-tweet this in reply to all of the openai employees' hype tweets.

Edit:

chat link: https://chatgpt.com/share/67962739-f04c-800a-a56e-0c2fc8c2dd...

Edit 2: Tried it on deepseek

The prompt pasted as is, it gave the same wrong answer: https://imgur.com/jpVcFVP

However, with reasoning enabled, it caught the fact that the original file is overwritten in its chain of thought, and then gave the correct answer. Here is the relevant part of the chain of thought in a pastebin: https://pastebin.com/gG3c64zD

And the correct answer:

  cp encrypted_key temp_key && \
  ssh-keygen -p -f temp_key -m pem -N '' && \
  mv temp_key decrypted_key

I find it quite interesting that this seemingly 2020-era LLM problem is only correctly solved on the latest reasoning model, but cool that it works.

3 comments

Kim_Bruning 508 days ago

Ah, I see. You phrased it in a misleading way. And once mislead, non-reasoning models can't/won't back up once they're down the wrong path.

Slight improvement:

"make ssh-keygen output decrypted version of a private key to another file . Use chain reasoning, think carefully to make sure the answer is correct. Summarize the correct commands at the end."

This improved the odds for me of getting the right answer in the format you were looking for in GPT-4o and Claude.

These things aren't magic oracles, they're tools.

link

porridgeraisin 508 days ago

What was misleading? It's a very reasonable prompt that contains all the information required to generate the rest of the answer.

I didn't ask or expect any format. The accurate answer in whatever format is all that is expected.

link

Kim_Bruning 508 days ago

It is likely to match on the typical command line pattern of redirecting stdout.

The way I see it; If/when it does so, a non-reasoning model can't (as easily) detect that this is an error, turn back, and go back down another path.

The modified prompt improves the odds somewhat by making it easier to detect a problem early on and change course, but it's not a 100% guarantee.

link

Game_Ender 508 days ago

It looks like o1 also gets the right answer after thinking about it for 14 seconds: https://chatgpt.com/share/67962ead-a5f8-800a-bd91-9a145b993e...

link

ANewFormation 508 days ago

The thing that makes the puzzle neat is that it's one that a reasonably clever person who literally just learned the rules of chess should be able to solve.

There's no nuance to it whatsoever beyond needing to demonstrate knowledge of the rules of the game.

link

plorkyeran 507 days ago

I think you have completely forgotten what it is like to be a beginner at chess if you think that someone who has just learned the rules of the game would be able to identify that the best move is to underpromote a pawn to force a draw.

link

ANewFormation 507 days ago

It's not about forcing a draw but recognizing that the only reasonable move loses the queen immediately.

Assessing the ending is irrelevant. All one needs to know is that having 1 piece is better than having 0 pieces. Not actually always true, but that's the beauty of this puzzle - you don't need anything other than the most basic logic to correctly solve it, at least the first move.

link