Hacker News new | ask | show | jobs
by distalx 42 days ago
A transmission error has a strictly contained, predictable blast radius. If a packet drops, the system knows exactly how to handle it: it throws a timeout, drops a connection, or asks for a retry. The worst-case scenario is known.

A reasoning error has an infinite, unpredictable blast radius. When an LLM hallucinates, it doesn't fail safely but it writes perfectly compiling code that does the wrong thing. That "wrong thing" might just render a button incorrectly, or it might silently delete your production database, or open a security backdoor.

You can build reliable abstractions over failures that are predictable and contained. You cannot abstract away unpredictable destruction.

4 comments

> A reasoning error has an infinite, unpredictable blast radius.

Says who? It’s quite easy to limit the blast radius of a reasoning error.

In 2024, a Chevy dealership deployed an AI chatbot that confidently agreed to sell a customer a 2024 Chevy Tahoe for $1. It executed a catastrophic business failure simply because it didn't know the logic was wrong.

Sure, you can patch that specific case with guardrails, but how many unpredictable edge cases are you going to cover? It only takes a user with a bit of ingenuity to circumvent them. There are already several examples of AI agents getting stuck in infinite loops, burning through massive API bills while achieving absolutely nothing.

You can contain a system failure, but you cannot contain a logic failure if the system doesn't know the logic is wrong.

This would be more convincing if a single car had been exchanged for $1.

It didn't happen. Seems the bug was "contained".

Sort of undermines your point re "catastrophic business failure" don't you think?

You literally can contain a logic failure. If I execute logic on my computer that’s not connected to the internet it can’t get out of the box. Done. Contained
> but how many unpredictable edge cases are you going to cover?

This is the wrong question. The correct question is what specific subsets of cases do you allow, similar to any security question

How so?

Suppose you had:

Math() Add() Subtract()

Program() Math(“calculate rate”)

This is intentionally written vaguely. How do you limit that these implementations ensure Program() runs and does the right thing when there is no guarantee Math() or its components are correct?

Normally you could use a typed programming language, unit tests, etc, but if LLM is the ultimate abstraction programs will be written line above. At some point traditional software engineering principles will need to apply.

Very few people are even beginning to understand the constraints of these systems, and none of them have yet been elevated to high enough positions of prominence to rise above the noise of all the hype. Give us some time man, jeez
A transmission error does not have a strictly contained blast radius.

A bad packet could tell a flying probe to fire all thrusters on and deplete its fuel in 15 minutes.

What makes a transmission error controlled is all the protection mechanisms on top of it. An LLM cannot delete a production database unless you give it access to do it.

My hot take is that many people are naturally more comfortable with deterministic systems that have clearly analyzable outcomes. Software engineering has historically primarily been oriented around deterministic systems and it has attracted that type of thinker.

But many of us, myself included, prefer chaotic systems where you can’t fully nail down every cause and effect. The challenge of building a prediction model on top of chaos is exhilarating. I really don’t find many people like me in SWE as in, say, the graphics design department.

To me, that’s the underlying threat here — LLMs are rewriting a field that has previously self selected a certain type of person and this, quite understandably, rubs them the wrong way.

Yes, but when all it takes to avoid this chaos is hiring someone with at least 5 or 10 years of experience for a reasonable wage, this entire perspective looks insane.

It's... just... not that hard to write code nor does it cost that much. There are millions of us working silently at places that aren't "big tech". We all shrugged our shoulders, took a sip of coffee, and went back to our Teams meetings where the only LLM usage is still just Copilot.

I don't need to be able to write proofs about my maths using logic and determinism. If the answer comes out in a way that I like then it has to be correct!
This is vapid condescension.

The comment you replied to made no statements about math or proofs. They made a statement about working in systems of non determinism effectively. Your statement seems to imply that this is dumb, as if working in a world of full determinism is an option.

Thank you for "vapid condescension".

I've wanted a term for this for decades!

when you do have the option of determinism, but intentionally eschew it in favour of a strictly inferior nondeterministic tool, then yes, it is kinda dumb.
What deterministic option are you referring to here? Humans certainly are not deterministic in how they interpret instructions and write code. If I asked you to implement a feature and a month later asked you to implement the exact same feature, you likely wouldn’t do it the same way again. Two different people certainly wouldn’t.
When you cling to determinism and call a clearly useful and powerful tool “strictly inferior” I would say this misses the point.
strictly inferior != bad. It's relative. One tool will still give the output i intended long after I'm dead and decomposed, with the other might not at the very next time i run it.
This is sorta how I've felt working the past ~7 years.

Simple example, we've been striving for 90% unit test coverage and thorough code review when there's 0% integration test coverage. I blame the metrics only looking at unit tests, but also many people think unit tests should come first. I would prioritize integration. There are some small pieces that need to work reliably, but if your system relies so hard on all of them working right, it's a bad system. That, and too many things will work in pieces but not overall.

Broadly I'm gonna assume that the team will later hire solid SWEs who don't necessarily know how our stuff works, and aren't going to read 100 docs about it. If this is a backend+DB combo, get your DB right and there won't be too many wrong ways to code against it in the future, get it wrong and it becomes a black hole for SWE-time. Or if someone on their first day can't run a system locally for debugging, no matter how elegant the code is, don't count on that system getting fixed quickly during an outage.

Insightful.

Feels like this maps to the J/P of Myers Briggs

I mean if your talking about packets, your already one abstraction over the real data Transmission, in wich is noisy. So bits can randomly flip, noise could be interpreted as bits, and bits could get lost. A much larger blast radius