Hacker News new | ask | show | jobs
by petercooper 1088 days ago
Can't speak for this tool yet, but ChatGPT has been great for this use case. Sometimes a tool doesn't have a man page, has hard to navigate docs, or whatever, "What does $flag mean in $tool" tends to work (when it doesn't hallucinate something totally wrong).

One recent example: "what does -w do on curl" .. not a single top 10 result on Google mentions it in the context, but GPT3.5 nails it in two seconds complete with a working example. I know "-w" is interpreted by Google in a special way, but frankly given the obvious context I shouldn't need to know how Google works. (Through experience I also know https://explainshell.com/explain?cmd=curl+-w will do a good job, but ChatGPT actually provides a working example which is even better.)

That said, I do think you need a good critical eye to use LLMs in this context. It's like relying on a calculator. You still need mental math skills to know that 91 * 10 can't equal 2511. Similarly, when GPT starts hallucinating, it helps if you have a high sensitivity to smelling it out.

3 comments

I think spreadsheet is a slightly better analogy than calculator. The latter has well defined capabilities and essentially 100% accuracy within those bounds. A spreadsheet with a minor typo in one field can produce drastically incorrect results that appear fine to the untrained eye.
> It's like relying on a calculator. You still need mental math skills to know that 91 * 10 can't equal 2511. Similarly, when GPT starts hallucinating, it helps if you have a high sensitivity to smelling it out.

Well, at least my calculators don't have this error rate GPT4 still does. Especially for seemingly simple things like a command flag, I have zero trust if GPT doesn't give me something that will eventually erase all my data.

A calculator isn't going to give me error. I don't need to fact check it constantly.
Maybe, but I suspect you're smart and have the intuition to "just know" if something went very wrong. For example, if you multiply 5 by some other integer and the result ends with a 2, you'd just feel it in your gut, no? This is not true for everyone!

It's not uncommon to have folks with reasonable math skills ending up behind a cash register and insisting they're right when such errors occur and I see a high likelihood of this happening with relying on "AI" too. Domain knowledge and good intuition will continue to be valuable.

how often are you using LLMs?
Somehow I knew that this question would come up, questioning the "progress" makes me a heretic.

So last 2-3 months I subscribed to ChatGPT4 (and much longer to Copilot), worked through most of the HN threads on tips and reviews, posts I could find on "prompt engineering" and have hundreds of sessions with ChatGPT4. So, I still might have missed something, but I think I have a rather good idea of what's going on.

1. It's rather good with understanding what I want. I can dump pretty much anything into it and give it certain rules (things we described years ago as "Google fu" until Google SERP became useless) and it will make something out of it.

2. It's a nice rubberduck to discuss things and get a broad overview on certain topics.

3. It's amazingly stupid, even if I ask it for its confidence, on the validity of its answers. It's like talking to a 8-year-old know-it-all: You have to fact check everything. If I confront it with the error, it even reacts like a 8-year old.

4. Initial responses for intentionally broad topics (summed up with "give me ansible yaml to deploy wireguard to N servers") are often times not working at all and after an hour of query-response you're better off reading ansible docs.

5. Initial responses for intentionally special topics (summed up with "what's the fastest algorithm to sort this given x, y, z and bla will never be A") it frequently comes up with good, sometimes surprisingly creative solutions.

All in all: Why oh why would I trade in correctness with a significant error rate ("hallucination" is a word from SV marketing hell) and debugging bullshit answers. Since debugging things is already a big drag in programming, I need things I can trust to build more things on top of them. If I can't trust 100% the "command" an LLM is generating, I'll never directly let it execute its code.

Thank you very much for your ChatGPT4 opinion. Do you think you can write your opinion about Copilot?
Agreed. For me, ChatGPT has been the killer app for terminal even as someone who’s pretty comfortable in the terminal
For me, there are multiple killer apps. One is in programming. In many cases, I work with a language I'm a bit unfamiliar with, so I ask it for idioms for things that I know a language should be able to do (like summing). Upon reading the code I see it's correct.

This used to be SO, but ChatGPT works way faster.