| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mhowland 634 days ago

"They're really good at some things, terrible at others, and prone to doing something totally wrong some fraction of the time."

I agree 100% with this sentiment, but, it also is a decent description of individual humans.

This is what processes and control systems/controls are for. These are evolving at a slower pace than the LLMs themselves at the moment so we're looking to the LLM to be its own control. I don't think it will be any better than the average human is at being their own control, but by no means does that mean it's not a solvable problem.

5 comments

latexr 634 days ago

> I agree 100% with this sentiment, but, it also is a decent description of individual humans.

But you can understand individual humans and learn which are trustworthy for what. If I want a specific piece of information, I have people in my life that I know I can consult to get an answer that will most likely be correct and that person will be able to give me an accurate assessment of their certainty and they know how to accurately confirm their knowledge and they’ll let me know later if it turns out they were wrong or the information changed and…

None of that is true with LLMs. I never know if I can trust the output, unless I’m already an expert on the subject. Which kind of defeats the purpose. Which isn’t to say they’re never helpful, but in my experience they waste my time more often than they save it, and at an environmental/energy cost I don’t personally find acceptable.

link

closeparen 634 days ago

It defeats the purpose of LLM as personal expert on arbitrary topics. But the ability to do even a mediocre job with easy unstructured-data tasks at scale is incredibly valuable. Businesses like my employer pay hundreds of professionals to run business process outsourcing sites where thousands of contractors repeatedly answer questions like "does this support contact contain a complaint about X issue?" And there are months-long lead teams to develop training about new types of questions, or to hire and allocate headcount for new workloads. We frequently conclude it's not worth it.

link

kenjackson 634 days ago

Actually humans are much worse in this regard. The top performer on my team had a divorce and his productivity dropped by like a factor of 3 and quality fell of a cliff.

Another example from just yesterday is I needed to solve a complex recurrence relation. A friend of mine who is good at math (math PhD) helped me for about 30 minutes still without a solution and a couple of false starts. Then he said try ChatGPT and we got the answer in 30s and we spent about 2 minutes verifying it.

link

andrepd 634 days ago

I call absolute bullshit on that last one. There's no way ChatGPT solves a maths problem that a maths PhD cannot solve, unless the solution is also googleable in 30s.

link

_w1tm 634 days ago

> unless the solution is also googleable in 30s.

Is anything googleable in 30s? It feels like finding the right combination of keywords that bypasses the personalization and poor quality content takes more than one attempt these days.

link

gomerspiles 634 days ago

Right, AI is really just what I use to replace google searches I would have used to find highly relevant examples 10 years back. We are coming out of a 5 year search winter.

link

andrepd 633 days ago

Duck-duck-goable then :)

link

salawat 633 days ago

>Actually humans are much worse in this regard. The top performer on my team had a divorce and his productivity dropped by like a factor of 3 and quality fell of a cliff.

Wow. Nice of you to see a coworker go through a traumatic life event, and the best you can drudge up is to bitch about lost productivity and decrease in selfless output of quality to someone else's benefit when they are at the time trying to stitch their life back together.

SMH. Goddamn.

Hope your recurrence relation was low bloody stakes. If you spent only two minutes verifying something coming out of a bullshit machine, I'd hazard you didn't do much in the way of boundary condition verification.

link

Gazoche 634 days ago

> I agree 100% with this sentiment, but, it also is a decent description of individual humans.

But humans can be held accountable, LLMs cannot.

If I pay a human expert to compile a report on something and they decide to randomly make up facts, that's malpractice and there could be serious consequences for them.

If I pay OpenAI to do the same thing and the model hallucinates nonsense, OpenAI can just shrug it and say "oh that's just a limitation of current LLMs".

link

linsomniac 634 days ago

>also is a decent description of individual humans

A friend of mine was moving from software development into managing devs. He told me: "They often don't do things the way or to the quality I'd like, but 10 of them just get so much more done than I could on my own." This was him coming to terms with letting go of some control, and switching to "guiding the results" rather than direct control.

The LLMs are a lot like this.

link

theamk 634 days ago

Your friend got lucky, I've seen (and worked with) people with negative productivity - they make the effort and sometimes they commit code, but it inevitably ends up being broken, and I realize that it would take less of my time for me to write the code myself, rather than spend all the time explaining and then fixing bugs.

The LLMS are a lot like this.

link

YeGoblynQueenne 633 days ago

>> I agree 100% with this sentiment, but, it also is a decent description of individual humans.

Why would that be a good thing? The big thing with computers is that they are reliable in ways that humans simply can't ever be. Why is it suddenly a success to make them just as unreliable as humans?

link

welshwelsh 633 days ago

I thought the big thing with computers is that they are much cheaper than humans.

If we are evaluating LLM suitability for tasks typically performed by humans, we should judge them by the same standards we judge humans. That means it's OK to make mistakes sometimes.

link

Too 634 days ago

You missed quoting the next sentence about providing confidence metric.

Humans may be wrong a lot but at least the vast majority will have the decency to say “I don’t know”, “I’m not sure”, “give me some time to think”, “my best guess is”. In contrast to most LLMs today that in full confidence just spews out more hallucinations.

link