Hacker News new | ask | show | jobs
by nprateem 1178 days ago
I wouldn't worry. It's code output is 50% garbage. I'm not in fear of my job. All the hypsters assuming this will destroy lawyers, accountants, doctors etc clearly don't understand correctness isn't something they can just tag on to a prediction machine.
3 comments

This mindset just artificially downplays the fact that it will get incrementally better, multiple times, and strikingly quickly. It’s insanely shortsighted IMO. You don’t have to believe AI is the next coming of Jesus to recognize it’s reached a point (or soon will) to be genuinely disruptive. Corporate execs will absolutely and quickly pursue using AI to reduce costs and speed up development. At the expense of employees.
See my other comment. I think logical rigour requires another breakthrough.
Next version will be 40% garbage. Then 30%, then 20%... And we got to the point where it has as much garbage as human code.
That's what you might expect, but it requires another breakthrough IMO. From what I understand of LLMs, they can't be incrementally improved to add logical reasoning since they're just guessing the next word. It's impressive and fine in many cases, but there are many where it's not enough.
They already have some form of reasoning though. GPT4 can solve novel problems. It's certainly not incredible at it, but the incremental logical reasoning improvements have already begun.
They do logically reason, that's the whole point.
They claim to. But there's no inherent understanding. Otherwise they could do maths and reason correctly about code. It's just probabilities.
But they do often seem to reason correctly about code.
It might be stick on the last 10% garbage, just like self-driving cars. Clearly it is an important change, but for some fields 90% might not be enough.

Personally, I think we will do more and more complicated things instead of just being done with programming.

What is lower bound of the garbage that we reach let's say in 20 years? Is it sufficiently low that you don't need someone to go over the output carefully? And if it fails that it won't destroy any lives?
How do you know that?
Can you show me sample prompts where GPT4 gives garbage? I am yet to find one.
>Can you show me sample prompts where GPT4 gives garbage? I am yet to find one.

Agreed. GPT-3 and GPT-3.5 commonly hallucinate. GPT-4 can certainly be made to behave badly, but on real questions I've put to GPT-4 it has a 0% hallucination rate. The few wrong answers it has given have been "sensibly wrong" in that it's highly likely an experienced human programmer would have made the same mistake (eg lots of Stack Overflow answers are wrong in the same way), and even its wrong answers have been helpful in guiding me towards the correct solution.

These occasional, "sensibly wrong" GPT-4 answers are fundamentally different from the correctly formatted academic bibliography citations for technical papers that never existed, by authors that never existed, in journals that never existed hallucinated "answers" I've received from GPT-3 and GPT-3.5.

I mean here's another example from right now regarding Terraform:

> Me: how to only run data "archive_file" if a path exists?

> GPT4: <blah blah blah> add: depends_on = [fileexists("/path/to/file")]

This is nonsense. Terraform tells me:

> A single static variable reference is required: only attribute access and indexing with constant keys. No calculations, function calls, template expressions, etc are allowed here

I just get this rubbish all too often to be afraid for my job.

My experience has been different. It very often hallucinates variables or function identifiers for me. I never witnessed it doing that for code on the first output. But once the chat/context grows larger and I already asked for modifications to the posted code, it happens quite often.

A non-code example: Some days ago I asked it about "Searle's Wall" [0]. It gave me a mashup of the correct description and the Chinese Room experiment. So it clearly had the right answer somewhere in its data, but it mixed it up with the much more famous thought experiment.

[0]: https://www.researchgate.net/publication/260138925_Searle's_...

I was trying to create an AWS IAM policy restricted to assumed roles. I didn't know you can't use assumed roles in principalArn conditions blocks but must refer to the IAM role instead. Gpt4 happily wasted an hour by shuffling the conditions around etc instead of telling me this. Sometimes it's policies were even malformed, and in all cases they didn't work.
Did you try giving GPT4 the relevant documentation akong with your query?
Without wanting to sound arsey, why should I have to? If I'd had the relevant docs to hand I wouldn't have needed it.

This is the problem IMO. The model needs to somehow learn that out of its entire training set, the single sentence in the AWS docs saying not to use the assumed role ARN takes precedence over any patterns it may have learnt elsewhere in this specific situation.

What you are describing is very different from the hallucination behavior of GPT-3 and GPT-3.5.

Yes, GPT-4 came up with an incorrect answer, but it's an incorrect answer an experienced programmer could legitimately have come up with, and one they probably would have come up with before actually testing their code against the AWS endpoints. GPT-4 sometimes gets hard questions wrong. GPT-3 and GPT-3.5 make up nonsense.

If a coworker told you GPT-4's answer, you'd say they were wrong but you wouldn't say they were hallucinating. If a co-worker gave you GPT-3 or GPT-3.5's answer you'd definitely doubt their sanity.

Yeah sure. But the OP is in fear of their job. I think there's a fair way to go until we're out of work. Wrong is still wrong.
Ask it to show you an example of how to use nom parser in Rust and try to compile and run the example.

Try a few times and ask for more complex example.

For example I have tried it at a really simple task and it failed. It cannot generate correct CSS selectors, it makes sense as it doesn't understand specificity (as most humans, haha)