| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TeMPOraL 459 days ago

Precisely this. People dismiss utility of LLMs because they don't give 100% reliability, without considering the basic facts that:

- LLMs != ChatGPT interface, they don't need to be run in isolation, nor do they need to do everything end-to-end.

- There are no 100% reliable systems - neither technological nor social. Voltages fluctuate, radiation flips bit, humans confabulate just as much if not worse than LLMs, etc.

- We create reliability from unreliable systems.

LLMs aren't some magic unreliability pixie dust that makes everything they touch beyond repair. They're just another system with bounded reliability, and can be worked into larger systems just like anything else, and total reliability can be improved through this.

EDIT: In fact, my example with probabilistic primality tests is bad because those tests are too nice - they let us compute tight bounds on the error rate in advance. LLMs are not like that. But then, a lot of systems we rely in our daily lives also have this property - their reliability is established empirically, i.e. we improve them until they work reliably enough, and then we hope they'll keep on working, and deal with random failures when they occur. So that's nothing new, either.

1 comments

kbolino 459 days ago

No, LLMs do not have "bounded reliability". All reliability figures for LLMs are based upon empirical observation in specific contexts using artificial benchmarks. As they say in finance, "past performance is not indicative of future results".

Saying LLMs are no worse than random bit flips is, again, an unjustified comparison. We can control bit errors with ECC, we cannot control the output of an LLM except to shackle it into uselessness.

link

TeMPOraL 459 days ago

I said bounded. I didn't say how tight. But all of science is about bounding empirical observations, so this is nothing new - nor is relying on systems with empirically established failure rates, which is a good chunk of what engineering is about.

link

kbolino 459 days ago

The number of 9s that can be assigned to these "bounds" currently is zero. They are not even 90% reliable. And there is no straightforward way to get to 90%, never mind 95%, 99%, etc. The sliding scale of reliability you originally presented just does not exist.

Yeah, sure, we can hypothetically engineer a system that tolerates a key step in the process which has, say, a 30% chance of being wrong, including a 10% chance of being dangerously wrong (appears correct but is broken in subtle ways), and a 5% chance of being batshit insane, but why would we? The amount of training, vetting, and supervision of human operators necessary to make a working process here immediately raises the question of whether the machine serves man or the other way around.

The best uses of an LLM are those where engineering levels of precision are neither required nor useful.

link

dcow 459 days ago

I see people hallucinate on HN all the time. We tolerate it. Why should we? We should if the overall inclusion of unreliable things (humans) provide value. The error rate for LLMs doesn't matter. The net value does. So if the value is great enough to tolerate the error rate, we do. We don’t categorically dismiss the technology because it can fail really poorly. We design things all the time which can fail catastrophically. Seriously. So LLMs will appear anywhere where the net value is positive. Maybe you’re taking a more nuanced stance, but I see a lot of “if it can hallucinate even once we can’t use it” rhetoric here. And that’s simply irrational. Even “we can’t use it for important things” is wrong. Doctors are using LLMs today to help collate observed data and suggest diagnoses. Trained professional in the loop mitigates the “terrible failure”. So no I don’t even agree that LLMs shall be relegated to non-important things.

link

kbolino 458 days ago

I also think categorically dismissing LLMs is a mistake.

However, an LLM for automated code generation (the context of the thread as I understand it) is basically a dubious-code-copy-paster on steroids. That was already the wrong way to develop code to begin with, automating and accelerating it is not an improvement.

There has never been a single case where I took code from Stack Overflow, which is already a relatively high quality source of such snippets, and didn't have to adapt it in at least some way to work with the code I already had. Heck, I often find rewriting the snippet entirely is better than copying and pasting it. Of course, I also give attribution, both for credit and for referring back to the original in case I made a mistake, the best solution changes in the future, there's context I didn't cover, etc. And in between the problems I solve with other people's help is a whole lot of code I write entirely on my own.

There are many cases of code in the wild being bad, not just from a "readability" or "performance" standpoint, but from a security standpoint. LLMs regurgitate bad code despite also having good code, and even the blog posts explaining what's good and what's bad, in their training corpus! And an LLM never gives attribution, partly because it was designed not to care, and partly because the end result is a synthesis of multiple sources rather than a pure regurgitation. Moreover, LLMs don't have much continuity, so they mix metaphors and naming conventions, they tie things together in absurd ways, etc. The end result is an unmaintainable mess, even if it happens to work.

So no, an LLM is not like a compiler, even though compilers often have their own special brand of crazy magic that isn't necessarily good. Nor is it going to deliver a robust way to turn abstract human thoughts into concrete code. It is still a useful tool, but it's not going to be an automated part of developing quality code. And this is going to be true for any non-coding scenario that requires at least the same level of reliability.

link

dcow 458 days ago

It already is automating parts of developing quality code. You’ll just have to believe me on that one, I guess.

link

fc417fc802 458 days ago

Finance is an excellent analogy. Relying on LLM output is similar to relying on the stock market. You might come out ahead but it's always a gamble and the lower bound is always catastrophic failure.

link