Hacker News new | ask | show | jobs
by crop_rotation 1178 days ago
Can you show me sample prompts where GPT4 gives garbage? I am yet to find one.
4 comments

>Can you show me sample prompts where GPT4 gives garbage? I am yet to find one.

Agreed. GPT-3 and GPT-3.5 commonly hallucinate. GPT-4 can certainly be made to behave badly, but on real questions I've put to GPT-4 it has a 0% hallucination rate. The few wrong answers it has given have been "sensibly wrong" in that it's highly likely an experienced human programmer would have made the same mistake (eg lots of Stack Overflow answers are wrong in the same way), and even its wrong answers have been helpful in guiding me towards the correct solution.

These occasional, "sensibly wrong" GPT-4 answers are fundamentally different from the correctly formatted academic bibliography citations for technical papers that never existed, by authors that never existed, in journals that never existed hallucinated "answers" I've received from GPT-3 and GPT-3.5.

I mean here's another example from right now regarding Terraform:

> Me: how to only run data "archive_file" if a path exists?

> GPT4: <blah blah blah> add: depends_on = [fileexists("/path/to/file")]

This is nonsense. Terraform tells me:

> A single static variable reference is required: only attribute access and indexing with constant keys. No calculations, function calls, template expressions, etc are allowed here

I just get this rubbish all too often to be afraid for my job.

My experience has been different. It very often hallucinates variables or function identifiers for me. I never witnessed it doing that for code on the first output. But once the chat/context grows larger and I already asked for modifications to the posted code, it happens quite often.

A non-code example: Some days ago I asked it about "Searle's Wall" [0]. It gave me a mashup of the correct description and the Chinese Room experiment. So it clearly had the right answer somewhere in its data, but it mixed it up with the much more famous thought experiment.

[0]: https://www.researchgate.net/publication/260138925_Searle's_...

I was trying to create an AWS IAM policy restricted to assumed roles. I didn't know you can't use assumed roles in principalArn conditions blocks but must refer to the IAM role instead. Gpt4 happily wasted an hour by shuffling the conditions around etc instead of telling me this. Sometimes it's policies were even malformed, and in all cases they didn't work.
Did you try giving GPT4 the relevant documentation akong with your query?
Without wanting to sound arsey, why should I have to? If I'd had the relevant docs to hand I wouldn't have needed it.

This is the problem IMO. The model needs to somehow learn that out of its entire training set, the single sentence in the AWS docs saying not to use the assumed role ARN takes precedence over any patterns it may have learnt elsewhere in this specific situation.

What you are describing is very different from the hallucination behavior of GPT-3 and GPT-3.5.

Yes, GPT-4 came up with an incorrect answer, but it's an incorrect answer an experienced programmer could legitimately have come up with, and one they probably would have come up with before actually testing their code against the AWS endpoints. GPT-4 sometimes gets hard questions wrong. GPT-3 and GPT-3.5 make up nonsense.

If a coworker told you GPT-4's answer, you'd say they were wrong but you wouldn't say they were hallucinating. If a co-worker gave you GPT-3 or GPT-3.5's answer you'd definitely doubt their sanity.

Yeah sure. But the OP is in fear of their job. I think there's a fair way to go until we're out of work. Wrong is still wrong.
Ask it to show you an example of how to use nom parser in Rust and try to compile and run the example.

Try a few times and ask for more complex example.

For example I have tried it at a really simple task and it failed. It cannot generate correct CSS selectors, it makes sense as it doesn't understand specificity (as most humans, haha)