Hacker News new | ask | show | jobs
by cryptozeus 1055 days ago
Is it just me or does everyone trust AI opinions less and less ? Every time I ask it to find top 5 of something, I go and double check myself and almost always find it to be wrong. For example try searching for top 5 restaurants around me in bard. Some of them dont even exist lol and some are just random if you cross verify with actual popularity from yelp etc.
6 comments

Using language models for location or time based things is not recommended, as this usually requires non-textual data. Better to use them for general knowledge questions, programming help, translation, or writing. Asking them to do any complex calculations (especially when they also require non-text raw data, like inflation in a given time period) is also futile.
> general knowledge questions, programming help, translation, or writing.

They get all of these wrong too. It's like some AI-specific variant of the Gell-man amnesia effect. It's usually right in the first sentence, but if you really know the answer, it's often either very debatable or completely wrong by the halfway mark of the paragraph. Meanwhile, the associated brand authority is problematic.

They don't get them all wrong, and even when they are not 100% correct they're usually better than nothing.

For instance, I needed to write code to spawn a child process and communicate with it via stdin/stdout in C++. This is pretty easy in most modern languages but in C++ you have to call POSIX's dump process spawning dance pretty much with raw syscalls. fork, execve, etc.

Rather than googling all the syscalls I would need and how to arrange them I just asked ChatGPT to do it. I've done it before so it was much easier to verify than to start from scratch.

And it got it 90% right. The only bit it got wrong was to make a single pipe and connect it to both stdin and stdout, rather than one pipe for each. But that was easy to spot and fix.

AI - at least for programming - is an enormous time saver. Could easily increase productivity by 50% in some cases.

In 5 years I expect it to be as normal as using an IDE. There are still people that slow themselves down by using unintelligent editors, and they will probably continue to live in the 80s, but people that use tools to help them will expect to use Copilot or similar all the time.

> In 5 years I expect it to be as normal as using an IDE.

Five years seems too conservative. Five years ago we only had GPT-1, which only generated funny word salad with acceptable syntax. An AI like ChatGPT seemed unthinkable at the time. And ChatGPT came out only last year. In five years similarly radical changes could happen. Programmers might actually get replaced with AI. Sounds too radical? But ChatGPT also would have sounded too radical five years ago!

It's silly to extrapolate breakthroughs.
AI breakthroughs happen at an increasing rate at least since AlexNet came out in 2012. Before that, "AI" was mostly OCR. The speed of progress is crazy. It doesn't look like a slowdown is ahead of us.
Gell-man is exactly what I’ve been referencing in conversation recently. Any professional will gladly explain why their field is really much too nuanced and complex for LLMs to threaten in the near term before seamlessly explaining how close we are to all those other engineers/doctors/clerks being automated right away.
In my opinion, roles directly threatened by LLMs are coodinators, client managers, etc. People whose job depends entirely on "soft skills" to interact and giving vague summaries, status updates, and assigning tasks.

A chat bot that scans Jira, accepts phone calls, and runs scrums can't possibly be any less reliable than some of the people I've worked with.

GPT-4 outperforms average students in exams of several fields. There are a lot of benchmarks, and GPT-4 does mostly do very well as long as the field relies enough on declarative knowledge.
I get different answers every time to "what is the third element in the periodic table" from llama2.

I'll hold off actually using them for now.

That model is not state of the art. Even GPT-3.5 can answer this question.
On GPT 3.5

Q: "What is the seventy fourth element of the periodic table?"

A: "The seventy-fourth element of the periodic table is Rhenium..."

But this is really shooting fish in a barrel. Given the way LLMs work why would you expect them to provide factually correct text completion?

Llama2, when fine-tuned, can be better than GPT3.5

Source: a trusted coworker

It's just reality sinking in.
Well it doesn’t surprise me since I have been saying this for a while that these LLMs hallucinate nonsense to the point where you end up triple checking whatever it outputs.

LLMs thrive in applications that involve creativity and non-serious applications mostly around fantasy or creative writing. Anyone using them seriously outside of summarization for high risk use cases is going to be very disappointed.

Perhaps the outcome is we get better at actually checking things, not a terrible result.
I recommend LLM users leverage the RAG technique
I'm glad that expectations are shifting. At the extremes, it's either a fancy parlor trick or a hyper-intelligent god. A lot of the original hype has skewed much closer to the hyper-intelligent god side of the spectrum. It's definitely not a fancy parlor trick, but it's likely closer to that than the other side it's being hyped as.
I think the most amusing comment I've read here in the last few weeks called it "demented Clippy".
My trust factor for online opinion is ranked:

1) Online forums (adding 'reddit' or 'hacker news' to a search query) 2) GPT4 3) Google search