| HN Mirror

Somehow I knew that this question would come up, questioning the "progress" makes me a heretic.

So last 2-3 months I subscribed to ChatGPT4 (and much longer to Copilot), worked through most of the HN threads on tips and reviews, posts I could find on "prompt engineering" and have hundreds of sessions with ChatGPT4. So, I still might have missed something, but I think I have a rather good idea of what's going on.

1. It's rather good with understanding what I want. I can dump pretty much anything into it and give it certain rules (things we described years ago as "Google fu" until Google SERP became useless) and it will make something out of it.

2. It's a nice rubberduck to discuss things and get a broad overview on certain topics.

3. It's amazingly stupid, even if I ask it for its confidence, on the validity of its answers. It's like talking to a 8-year-old know-it-all: You have to fact check everything. If I confront it with the error, it even reacts like a 8-year old.

4. Initial responses for intentionally broad topics (summed up with "give me ansible yaml to deploy wireguard to N servers") are often times not working at all and after an hour of query-response you're better off reading ansible docs.

5. Initial responses for intentionally special topics (summed up with "what's the fastest algorithm to sort this given x, y, z and bla will never be A") it frequently comes up with good, sometimes surprisingly creative solutions.

All in all: Why oh why would I trade in correctness with a significant error rate ("hallucination" is a word from SV marketing hell) and debugging bullshit answers. Since debugging things is already a big drag in programming, I need things I can trust to build more things on top of them. If I can't trust 100% the "command" an LLM is generating, I'll never directly let it execute its code.