| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by philovivero 1138 days ago

> Both of those are systems which had to work right. Large language models are not even close to being safe to use in such applications. Until LLMs report "don't know" instead of hallucinating, they're limited to very low risk applications such as advertising and search.

Are humans limited to low-risk applications like that?

Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

And I don't want to count the number of times I've personally done that, but I'm sure it's >0. And I hate to tell you, but I've spent the last 20 years in positions of authority that could have caused massive amounts of damage not only to the companies I've been employed by, but a large cross-section of society as well. And those fools I referenced in the last paragraph? Same.

I think people are too hasty to discount LLMs, or LLM-backed agents, or other LLM-based applications because of their limitations.

(Related: I think people are too hasty to discount the catastrophic potential of self-modifying AGI as well)

8 comments

memefrog 1138 days ago

Can people please stop making this comment in reply to EVERY criticism of LLMs? "Humans are flawed too".

We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.

You remember a few isolated incidents because they're salient. That does not mean that it's representative of your average personal interactions.

famouswaffles 1138 days ago

>We do not normally hallucinate.

Oh yes we do lol. Many experiments show our perception of reality and of cognition is entirely divorced from the reality of what's really going on.

Your brain is making stuff up all the time. Sense data you perceive is partly fabricated. Your memories are partly fabricated. Your decision rationales are post hoc rationalizations more often than not. That is, you don't genuinely know why you make certain decisions or what preferences actually inform them. You just think you do. You can't recreate previous mental states. You are not usually aware. But it is happening.

LLMs are just undoubtedly worse right now.

worrycue 1138 days ago

We don’t hallucinate in such a way / to the extend that it compromises our ability to do our job.

Currently no one will trust a LLM to even run a helpline - that just a lawsuit waiting to happen should the AI hallucinate a “solution” that results in loss of property, injury or death.

famouswaffles 1138 days ago

>Currently no one will trust a LLM to even run a helpline - that just a lawsuit waiting to happen should the AI hallucinate a “solution” that results in loss of property, injury or death.

I'm not quite sure exactly what you mean by helpline here (general customer service or more specific ?) but assuming the former..

How much power do you think most helplines actually have ? Most are running off pre-written scripts/guidelines with very little in the way of decisional power. There's a reason for that.

Injury or death ? LLM hallucinations are relational. Unless you're speaking to Dr GPT or something to that effect, a response resulting in injury or death isn't happening.

strokirk 1137 days ago

Having worked in the help-line business, I can tell you that many corporations would and do use LLMs for their helpline, and used worse options before.

jph00 1137 days ago

> We do not normally hallucinate. We are sometimes wrong, and sometimes are wrong about the confidence they should attach to their knowledge. But we do not simply hallucinate and spout fully confidence nonsense constantly. That is what LLMs.

In my average interaction with GPT 4 there are far less errors than in this paragraph. I would say that here you in fact "spout fully confidence nonsense" (sic).

Some humans are better than others at saying things that are correct, and at saying things with appropriately calibrated confidence. Some LLMs are better than some humans in some situations at doing these things.

You seem to be hung up on the word "hallucinate". It is, indeed, not a great word and many researchers are a bit annoyed that's the term that's stuck. It simply means for an LLM to state something that's incorrect as if it's true.

The times that LLMs do this do stand out, because "You remember a few isolated incidents because they're salient".

leoedin 1137 days ago

> Some humans are better than others at saying things that are correct, and at saying things with appropriately calibrated confidence.

That's true - which is why we have constructed a society with endless selection processes. Starting from kindergarten, we are constantly assessing people's abilities - so that by the time someone is interviewing for a safety critical job they've been through a huge number of gates.

lexandstuff 1137 days ago

The equivalent of hallucinations in LLMs is false memories [1] in people. They happen all the time.

[1] https://en.wikipedia.org/wiki/False_memory

hyperthesis 1138 days ago

> Are humans limited to low-risk applications like that?

No, but arguably civilization consists of mechanisms to manage human fallibility (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc). We might not fully understand why, but we've found methods that sorta kinda "work".

> could have caused

That's why they didn't.

TeMPOraL 1138 days ago

> No, but arguably civilization consists of mechanisms to manage human fallibility

Exactly. Civilization is, arguably, one big exercise in reducing variance in individuals, as low variance and high predictability is what lets us work together and trust each other, instead of seeing each other as threats and hiding from each other (or trying to preemptively attack). The more something or someone is unpredictable, the more we see it or them as a threat.

> (separation of powers, bicameralism, "democracy", bureaucracy, regulations, etc).

And on the more individual scale: culture, social customs and public school system are all forces that shape humans from the youngest age, reducing variance in thoughts and behaviors. Exams of all kind, including psychological ones, prevent high-variance individuals from being able to do large amount of harm to others. The higher the danger, the higher the bar.

There are tests you need to pass to be able to own and drive a car. There are tests you may need to pass to own a firearm. There are more tests still before you'll be allowed to fly an aircraft. Those tests are not there just to ensure your skills - they also filter high-variance individuals, people who cannot be safely given responsibility to operate dangerous tools.

Further still, the society has mechanisms to eliminate high-variance outliers. Lighter cases may get some kind of medical or spiritual treatment, and (with gates in place to keep them away from guns and planes) it works out OK. More difficult cases eventually get locked up in prisons or mental hospitals. While there are lot of specific things to discuss about the prison and mental care systems, their general, high-level function is simple: they keep both predictably dangerous and high-variance (i.e. unpredictably dangerous) people stashed safely away, where they can't disrupt or harm others at scale.

> We might not fully understand why, but we've found methods that sorta kinda "work".

Yes, we've found many such methods at every level - individual, familial, tribal, national - and we stack them all on top of each other. This creates the conditions that let us live in larger groups, with less conflicts, as well as to safely use increasingly powerful (i.e. potentially destructive) technologies.

throwuwu 1137 days ago

I think you’re weighting the contribution of authority a bit too highly. The bad actors to be concerned about are a very small percentage of the population and we do need institutions with authority to keep those people at bay but it’s not like there’s this huge pool of “high variance” people that need to be screened out. The vast majority of people are extremely close in both opinion and ability, any semblance of society would be impossible otherwise.

TeMPOraL 1137 days ago

> it’s not like there’s this huge pool of “high variance” people that need to be screened out. The vast majority of people are extremely close in both opinion and ability, any semblance of society would be impossible otherwise.

Yes, but I'm saying it's not an accident - I've mentioned mechanisms like culture, social customs, and education, which we've been using in some form for all our recorded history. I should've probably added violent conflicts within and between tribes/groups, too, which also acted to reduce variance, by culling the more volatile and less agreeable people. People today are "extremely close in both opinion and ability" because for the past couple thousands years, generation by generation, we've been busy reducing the variance of individuals.

EDIT: keeping high-variance individuals locked up safely away is just one of the methods we use, specifically to deal with outliers. It too traces back to the dawn of recorded history - shunning, expelling individuals from the tribe (which often meant certain death), sending them to faraway lands, or forcing them into war, were other common means past societies used to eliminate high-variance outliers.

As for authority, it's a separate topic - I argue that hierarchical governance is an artifact of scale: it's necessary to coordinate groups past certain size (~Dunbar's number), when our basic social intuitions are no longer up to the task. But the first level of hierarchy can handle only so many people, and if you want to coordinate multiple such groups, you need to add another layer... and that's how, over time, human societies scaled from tribes of couple dozen people, to nation states of hundreds of millions.

Even as the focus is usually on the national governments, the entire hierarchy is still there - you have states and lands/vovoidships/counties with their own governance, then another level for a major city and surrounding villages, then yet another level in each individual village, and one or two levels in the city itself, etc. We don't often pay attention to it, but the hierarchy of governance does reach down, in some form, all the way to groups of couple hundred people or less.

ilyt 1138 days ago

>Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

Spouting out the most ignorant stuff is one of the lowest risk things you can do in general. We're talking about running a code where bug can do a ton of damage, financial or otherwise, not water-cooler conversations.

cmiles74 1138 days ago

In the train example, the UI is in place to prevent a person from making a dangerous route. I think the idea here is that an LLM cannot take the place of such a UI as they are inherently unreliable.

NikolaNovak 1138 days ago

To your point,Humans are augmented by checklists and custom processes in critical situations. And very certainly applications include which mimic such safety checklists. We don't NEED to start from LLM perspective of our goal is different and doesn't benefit from LLM. Not all UI or architecture is fit for all purposes.

dorkwood 1137 days ago

Couldn’t you make this same argument with a chat bot that wasn’t an LLM at all?

“Yes, it may have responded with total nonsense just now, but who among us can say they’ve never done the same in conversation?”

Mawr 1136 days ago

> Are humans limited to low-risk applications like that?

Yes, of course. That's why the systems the parent mentioned designed humans out of the safety-critical loop.

> Because humans, even some of the most humble, will still assert things they THINK are true, but are patently untrue, based on misunderstandings, faulty memories, confused reasoning, and a plethora of others.

> I can't count the number of times I've had conversations with extremely well-experience, smart techies who just spout off the most ignorant stuff.

The key difference is that when the human you're having a conversation with states something, you're able to ascertain the likelihood of it being true based on available context: How well do you know them? How knowledgeable are they about the subject matter? Does their body language indicate uncertainty? Have they historically been a reliable source of information?

No such introspection is possible with LLMs. Any part of anything they say could be wrong and to any degree!

ra 1138 days ago

I wholeheartedly agree with the main thrust of your comment. Care to expand on your (related: potential catastrophe) opinion?