| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by crop_rotation 1178 days ago
	Can you show me sample prompts where GPT4 gives garbage? I am yet to find one.

4 comments

yodon 1178 days ago

>Can you show me sample prompts where GPT4 gives garbage? I am yet to find one.

Agreed. GPT-3 and GPT-3.5 commonly hallucinate. GPT-4 can certainly be made to behave badly, but on real questions I've put to GPT-4 it has a 0% hallucination rate. The few wrong answers it has given have been "sensibly wrong" in that it's highly likely an experienced human programmer would have made the same mistake (eg lots of Stack Overflow answers are wrong in the same way), and even its wrong answers have been helpful in guiding me towards the correct solution.

These occasional, "sensibly wrong" GPT-4 answers are fundamentally different from the correctly formatted academic bibliography citations for technical papers that never existed, by authors that never existed, in journals that never existed hallucinated "answers" I've received from GPT-3 and GPT-3.5.

link

nprateem 1177 days ago

I mean here's another example from right now regarding Terraform:

> Me: how to only run data "archive_file" if a path exists?

> GPT4: <blah blah blah> add: depends_on = [fileexists("/path/to/file")]

This is nonsense. Terraform tells me:

> A single static variable reference is required: only attribute access and indexing with constant keys. No calculations, function calls, template expressions, etc are allowed here

I just get this rubbish all too often to be afraid for my job.

link

graboid 1177 days ago

My experience has been different. It very often hallucinates variables or function identifiers for me. I never witnessed it doing that for code on the first output. But once the chat/context grows larger and I already asked for modifications to the posted code, it happens quite often.

A non-code example: Some days ago I asked it about "Searle's Wall" [0]. It gave me a mashup of the correct description and the Chinese Room experiment. So it clearly had the right answer somewhere in its data, but it mixed it up with the much more famous thought experiment.

[0]: https://www.researchgate.net/publication/260138925_Searle's_...

link

nprateem 1178 days ago

I was trying to create an AWS IAM policy restricted to assumed roles. I didn't know you can't use assumed roles in principalArn conditions blocks but must refer to the IAM role instead. Gpt4 happily wasted an hour by shuffling the conditions around etc instead of telling me this. Sometimes it's policies were even malformed, and in all cases they didn't work.

link

ilaksh 1178 days ago

Did you try giving GPT4 the relevant documentation akong with your query?

link

nprateem 1178 days ago

Without wanting to sound arsey, why should I have to? If I'd had the relevant docs to hand I wouldn't have needed it.

This is the problem IMO. The model needs to somehow learn that out of its entire training set, the single sentence in the AWS docs saying not to use the assumed role ARN takes precedence over any patterns it may have learnt elsewhere in this specific situation.

link

yodon 1178 days ago

What you are describing is very different from the hallucination behavior of GPT-3 and GPT-3.5.

Yes, GPT-4 came up with an incorrect answer, but it's an incorrect answer an experienced programmer could legitimately have come up with, and one they probably would have come up with before actually testing their code against the AWS endpoints. GPT-4 sometimes gets hard questions wrong. GPT-3 and GPT-3.5 make up nonsense.

If a coworker told you GPT-4's answer, you'd say they were wrong but you wouldn't say they were hallucinating. If a co-worker gave you GPT-3 or GPT-3.5's answer you'd definitely doubt their sanity.

link

nprateem 1177 days ago

Yeah sure. But the OP is in fear of their job. I think there's a fair way to go until we're out of work. Wrong is still wrong.

link

scotty79 1178 days ago

Ask it to show you an example of how to use nom parser in Rust and try to compile and run the example.

Try a few times and ask for more complex example.

link

is_true 1177 days ago

For example I have tried it at a really simple task and it failed. It cannot generate correct CSS selectors, it makes sense as it doesn't understand specificity (as most humans, haha)

link