Hacker News new | ask | show | jobs
by nforgerit 1090 days ago
> It's like relying on a calculator. You still need mental math skills to know that 91 * 10 can't equal 2511. Similarly, when GPT starts hallucinating, it helps if you have a high sensitivity to smelling it out.

Well, at least my calculators don't have this error rate GPT4 still does. Especially for seemingly simple things like a command flag, I have zero trust if GPT doesn't give me something that will eventually erase all my data.

2 comments

A calculator isn't going to give me error. I don't need to fact check it constantly.
Maybe, but I suspect you're smart and have the intuition to "just know" if something went very wrong. For example, if you multiply 5 by some other integer and the result ends with a 2, you'd just feel it in your gut, no? This is not true for everyone!

It's not uncommon to have folks with reasonable math skills ending up behind a cash register and insisting they're right when such errors occur and I see a high likelihood of this happening with relying on "AI" too. Domain knowledge and good intuition will continue to be valuable.

how often are you using LLMs?
Somehow I knew that this question would come up, questioning the "progress" makes me a heretic.

So last 2-3 months I subscribed to ChatGPT4 (and much longer to Copilot), worked through most of the HN threads on tips and reviews, posts I could find on "prompt engineering" and have hundreds of sessions with ChatGPT4. So, I still might have missed something, but I think I have a rather good idea of what's going on.

1. It's rather good with understanding what I want. I can dump pretty much anything into it and give it certain rules (things we described years ago as "Google fu" until Google SERP became useless) and it will make something out of it.

2. It's a nice rubberduck to discuss things and get a broad overview on certain topics.

3. It's amazingly stupid, even if I ask it for its confidence, on the validity of its answers. It's like talking to a 8-year-old know-it-all: You have to fact check everything. If I confront it with the error, it even reacts like a 8-year old.

4. Initial responses for intentionally broad topics (summed up with "give me ansible yaml to deploy wireguard to N servers") are often times not working at all and after an hour of query-response you're better off reading ansible docs.

5. Initial responses for intentionally special topics (summed up with "what's the fastest algorithm to sort this given x, y, z and bla will never be A") it frequently comes up with good, sometimes surprisingly creative solutions.

All in all: Why oh why would I trade in correctness with a significant error rate ("hallucination" is a word from SV marketing hell) and debugging bullshit answers. Since debugging things is already a big drag in programming, I need things I can trust to build more things on top of them. If I can't trust 100% the "command" an LLM is generating, I'll never directly let it execute its code.

Thank you very much for your ChatGPT4 opinion. Do you think you can write your opinion about Copilot?