Hacker News new | ask | show | jobs
by tpurves 50 days ago
The conceptual problem is that we keep wanting to compare AI behavior to that of traditional computers. The proper comparison is comparing AI, and how we trust or delegate to it, to the concept of delegating to other humans or even to domestic animal. Employees can be trained and given very specific skills and guidelines but still have agency and non-deterministic behavior. A seeing eye dog, a pack mule or chariot horse will often, but not necessarily always do what you ask of them. We've only been delegating to deterministic programmable machines for very short part of human history. But ad human societies, we've been collectively delegating a lot of useful activities to non-perfectly-dependable agents (ie each other) for a very long time. As as humans we've gotten done more that a few notable things in the last several millennia with this method. However, humans as delegates or as delegators have also done a lot of horrific things at scale to, both by accident or by design. And meanwhile (gestures broadly around everywhere) maybe humans actually aren't doing such an optimal job of running and governing everything important in the world?

When compared to how human make a mess of things like in the real world, how high does the bar really need to be for trusting AI agents. Even far shy from perfect, AI could still be a step function improvement over trusting ourselves.

5 comments

Human delegation is disciplined as much by incentive alignment as instruction. The same is true for LLM's. The problem is that it's not possible to dominate intentions, LLM or human, because delegates/agents to be useful need autonomy.

The SOTA models are working on making them more capable and then adding guardrails for safety. It would be better to work on baking in incentive alignment, which probably means eliciting more incentive details from the LLM user. That's what I'd be working on at Apple, where the user might be induced to share a level of local-only details that could align the AI agents.

AI "disciplining" is arguably much simpler, easier and cheaper.
> how high does the bar really need to be for trusting AI agents.

You can hold a human responsible for that they do; you can reward them, fire them, sue them, etc.

You cannot do any of those things with an LLM. The threat of termination means nothing to an LLM.

Totally agree. And I expect at some point people might come around on, “don’t pay for and use that tool for that particular job.”

Like, there isn’t enough hype the world to make people replace all knives, hammers, and screwdrivers with sawzalls. They have awesome utility for certain things and they’re a bad fit for other things.

Maybe we’ll get there with LLMs someday.

Well AI agents thinking capabilities are inspired by our own “neural networks.” AI makes the same mistakes we do it’s just called different things.

How many people say something like, “if I recall correctly.” This statement emphasizes that we think we know, but we’re just adding that disclaimer to protect ourselves from cancel culture.

People call that “Hallucination” when talking about an AI. It’s not hallucination, it’s beautiful imperfection.

> And meanwhile (gestures broadly around everywhere) maybe humans actually aren't doing such an optimal job of running and governing everything important in the world?

The issue with this is that you want to impune, in the grand scheme of things, a small few individuals. And so you want to institute an AI system. Which are controlled by the same individuals (or at least the same class of individuals, with the reach to abuse such a system).

I'll hear you out if AI becomes truly decentralized. Until then, no, this line of rhetoric is just justification for the surveillance state that's to come (to be fair, the surveillance state would pick yet another justification, regardless).

A very talented junior employee that you can't trust with the keys.
The main difference is that this junior employee can't be held responsible if anything goes wrong. And the company which rented you this employee absolves itself from all responsibility too.

Here is a fresh example from today of what junior employee do when given unlimited agentic power : https://www.reddit.com/r/ClaudeAI/comments/1sv7fvc/im_a_nurs...

Your example is not from a Jr developer but from a free agent.

I think you will find it very hard to keep a Jr dev in a Corp responsible.

I actually think you will find that it is easier to work with agents at a higher quality and lower legal risk than using Jr developers.

And this is only going to be amplified when it becomes common knowledge that Ai poses less risk to projects, than Jr staff.

How would you hold the junior responsible if they bring down the system and it cost a million dollars? Will they actually recover all the money from the employee ?
I understand you mean this as it is close to that in terms of getting the final work.

But in my opinion, it is not even remotely close to the reliability of an educated human, communication wise.

If you gave a research task to a less experienced person, you wouldn’t expect them to convincingly lie about details.

It is useful as a review tool or boilerplate generator but it is not the same aspect you would use a human from.

Who do you trust with the keys? In any well run organization you have multiple layers of controls. The same concept applies here and I think the gp commenter captured it very well.
I think you'd trust someone with the keys when they've consistently shown that they can be trusted with less critical work. If you're having to constantly monitor someone's output, then promoting them is a liability.

The same applies to an AI model.

And, since the same model would be deployed by many teams, unexpected behavior from that model even for a small subset of those teams means that it can't be promoted.

> In any well run organization you have multiple layers of controls.

Everything depends on size.

A business with 8 employees might need 3 of them to be (literal) keyholders, and might be situated such that any of the keyholders has it in their power to destroy the business.

This is not ideal, obviously, but it is how the world has worked for a very long time, and it is difficult to understand how to make it better in some cases. Modern technology, such as cameras, might help, or might simply help to allocate blame after destruction has occurred.

In any case, this is the background of how people are used to working. We all deal with people who can absolutely destroy us, starting with the cop on the corner.

And we have mechanisms, both before-the-fact, like social coercion, and after-the-fact, like the legal system, to help ensure that this usually works.

LLMs exist in a world where most people are used to extending trust, but it isn't possible for LLMs to conform to the historical expectations that underpin that trust.

Yes. I think you can get agents to “Conscious competence” with a lot of well-designed oversight, direction and control. It works, but it’s fragile - nothing like the judgement needed to handle novel situations well.

https://en.wikipedia.org/wiki/Four_stages_of_competence