Hacker News new | ask | show | jobs
by mhowland 634 days ago
"They're really good at some things, terrible at others, and prone to doing something totally wrong some fraction of the time."

I agree 100% with this sentiment, but, it also is a decent description of individual humans.

This is what processes and control systems/controls are for. These are evolving at a slower pace than the LLMs themselves at the moment so we're looking to the LLM to be its own control. I don't think it will be any better than the average human is at being their own control, but by no means does that mean it's not a solvable problem.

5 comments

> I agree 100% with this sentiment, but, it also is a decent description of individual humans.

But you can understand individual humans and learn which are trustworthy for what. If I want a specific piece of information, I have people in my life that I know I can consult to get an answer that will most likely be correct and that person will be able to give me an accurate assessment of their certainty and they know how to accurately confirm their knowledge and they’ll let me know later if it turns out they were wrong or the information changed and

None of that is true with LLMs. I never know if I can trust the output, unless I’m already an expert on the subject. Which kind of defeats the purpose. Which isn’t to say they’re never helpful, but in my experience they waste my time more often than they save it, and at an environmental/energy cost I don’t personally find acceptable.

It defeats the purpose of LLM as personal expert on arbitrary topics. But the ability to do even a mediocre job with easy unstructured-data tasks at scale is incredibly valuable. Businesses like my employer pay hundreds of professionals to run business process outsourcing sites where thousands of contractors repeatedly answer questions like "does this support contact contain a complaint about X issue?" And there are months-long lead teams to develop training about new types of questions, or to hire and allocate headcount for new workloads. We frequently conclude it's not worth it.
Actually humans are much worse in this regard. The top performer on my team had a divorce and his productivity dropped by like a factor of 3 and quality fell of a cliff.

Another example from just yesterday is I needed to solve a complex recurrence relation. A friend of mine who is good at math (math PhD) helped me for about 30 minutes still without a solution and a couple of false starts. Then he said try ChatGPT and we got the answer in 30s and we spent about 2 minutes verifying it.

I call absolute bullshit on that last one. There's no way ChatGPT solves a maths problem that a maths PhD cannot solve, unless the solution is also googleable in 30s.
> unless the solution is also googleable in 30s.

Is anything googleable in 30s? It feels like finding the right combination of keywords that bypasses the personalization and poor quality content takes more than one attempt these days.

Right, AI is really just what I use to replace google searches I would have used to find highly relevant examples 10 years back. We are coming out of a 5 year search winter.
Duck-duck-goable then :)
>Actually humans are much worse in this regard. The top performer on my team had a divorce and his productivity dropped by like a factor of 3 and quality fell of a cliff.

Wow. Nice of you to see a coworker go through a traumatic life event, and the best you can drudge up is to bitch about lost productivity and decrease in selfless output of quality to someone else's benefit when they are at the time trying to stitch their life back together.

SMH. Goddamn.

Hope your recurrence relation was low bloody stakes. If you spent only two minutes verifying something coming out of a bullshit machine, I'd hazard you didn't do much in the way of boundary condition verification.

> I agree 100% with this sentiment, but, it also is a decent description of individual humans.

But humans can be held accountable, LLMs cannot.

If I pay a human expert to compile a report on something and they decide to randomly make up facts, that's malpractice and there could be serious consequences for them.

If I pay OpenAI to do the same thing and the model hallucinates nonsense, OpenAI can just shrug it and say "oh that's just a limitation of current LLMs".

>also is a decent description of individual humans

A friend of mine was moving from software development into managing devs. He told me: "They often don't do things the way or to the quality I'd like, but 10 of them just get so much more done than I could on my own." This was him coming to terms with letting go of some control, and switching to "guiding the results" rather than direct control.

The LLMs are a lot like this.

Your friend got lucky, I've seen (and worked with) people with negative productivity - they make the effort and sometimes they commit code, but it inevitably ends up being broken, and I realize that it would take less of my time for me to write the code myself, rather than spend all the time explaining and then fixing bugs.

The LLMS are a lot like this.

>> I agree 100% with this sentiment, but, it also is a decent description of individual humans.

Why would that be a good thing? The big thing with computers is that they are reliable in ways that humans simply can't ever be. Why is it suddenly a success to make them just as unreliable as humans?

I thought the big thing with computers is that they are much cheaper than humans.

If we are evaluating LLM suitability for tasks typically performed by humans, we should judge them by the same standards we judge humans. That means it's OK to make mistakes sometimes.

You missed quoting the next sentence about providing confidence metric.

Humans may be wrong a lot but at least the vast majority will have the decency to say “I don’t know”, “I’m not sure”, “give me some time to think”, “my best guess is”. In contrast to most LLMs today that in full confidence just spews out more hallucinations.