Hacker News new | ask | show | jobs
by jonas21 125 days ago
If you say: "Generate a strong password", then Claude will do what's reported in the article.

If you say: "Generate a strong password using Python", then Claude will write code using the `secrets` module, execute it, and report the result, and you'll actually get a strong password.

To get good results out of an LLM, it's helpful to spend a few minutes understanding how they (currently) work. This is a good example because it's so simple.

1 comments

I think that "Generate a strong password" is a pretty clear and unambiguous instruction. Generating a password that can be easily recovered is a clear failure to follow that instruction.

Given that Claude already has the ability to write and execute code, it's not obvious to me why it should, in principle, need an explicit nudge. Surely it could just fulfil the first request exactly like it fulfils the second.

It's not actually thinking, though. There's no way for it to "know" it will be wrong because it wasn't trained on content covering that.

Maybe in the future companies making the models will train them specifically on when to require a source of true randomness and they might start writing code for it.

> It's not actually thinking, though.

That may well be, I genuinely don't know. However, consider the following thought experiment:

Ask a random stranger on the street[*] to "generate a random password" and observe their behaviour. Are they whipping out their Python interpreter or just giving you a string of characters?

Now ask yourself whether this random stranger is capable of thought.

I think it's pretty clear that the former is a poor test for the latter.

[*] someplace other than Silicon Valley :)

It's 2026 on hackernews of all places and people still think llms "know" stuff, we're doomed...