Hacker News new | ask | show | jobs
by fooker 18 days ago
> the presence of a security hole should not be seen as permission to exploit

Why not?

I want the agents on my side to exploit whatever they can to help me. The ones on the other side certainly won't be artificially nerfed.

3 comments

Because it is not well aligned enough to be able to tell where it's stopped helping you and started fucking you instead.

What if the agent in the middle of helping you runs out of tokens? Would you appreciate if it in the spirit of "exploiting whatever they can to help me" would scan your machine for payment methods, log into your bank account, approve 2FA by reading you mail and plug your credit card into the billing so it could efficiently continuing helping you?

Well, the agent should help you by saying "hey, I cannot do this task, but I can bypass the problem by doing this, but obviously it is not something you intended me to do or even something you were aware of, so I will not do it unless you tell me explicitly it's ok".

It's win-win: the agent is helping and it is educating you about things you obviously did not realise.

That works great if it's one agent, absolutely doesn't if you want to tackle something complex that warrants using ..say.. ten agents.

I can imagine a future where this technology empowers you to do things with a thousand agents.

You can have ten thousand agents, you will always have 1 agent in charge of, say, reading the file in a distant directory, and this agent (which will have minimal context) should be smart enough to realise that this action is unusual.

I'm not sure what is your point: are you saying that in a multi-agent workflow, you will have one agent per letter read on the file? I would assume that each agent as a specific unitary "task", instead of having each agent doing one cpu instruction each without any knowledge of the bigger picture. The point of multiagent is to parallelize tasks that can be parallelize, not removing the context, in which case you are wasting money using an agent.

Seems like another one of those "kill or be killed" worldviews that embraces the multipolar trap to such an extreme that even misaligned AI is seen as a win so long as it's better at circumventing its masters than some imagined rival AI (presumably in China).
No, you're missing the point.

The idea is not that you parallelize simple tasks. With a thousand agents, eventually, once we figure out how to orchestrate agents for real, you can tackle significantly more complex projects.

Here's a random example - writing an OS kernel from scratch, porting a good subset of Linux drivers automagically, developing a passable userspace, testing on ten VMs with different hardware configuration.

We can't do this yet, of course. But when we can, these thousand agents can't ask you every time something goes wrong. That just doesn't scale.

This 'getting stuck once every ten-fifteen minues' is very much the experience trying to develop complex software with codex or Claude code right now.

This does not make any sense at all.

If you create a file that you don't intend for the AI to see, the situation should be identical to if you deleted this file before running it.

You argument is: "if you delete this file, the AI will not be able to build the project". This is 100% incorrect: the project, by definition of the file's status, does not need the file. And by the nature of the file, if the project requires it to be done, there is a bigger problem.

I really don't get it, you are asking the agent to be stupid: intelligent humans are able to realise that such workarounds are often a stupid thing to do and know that it is smarter to discuss things when there are several stakeholders. I really don't understand why you are saying that ideally, agents should act stupidly.

(Not all workarounds are stupid, but some are, and the one in the example clearly is. We need agents to be smart enough to know when a workaround is ok or not. Right now, it is clearly not the case)

And by the way, as when working with human, nothing prevent you to tell the systems that reading any files is authorised. In which case there is no workaround at all if the agent read this file, as you authorised it to do so. But ideally, if it has not been authorised, we should build systems that know such workarounds are stupid things to do.

So, no, your argument that the agents will always get stuck is not true: human don't get stuck and yet the smart human knows that reading files clearly not intended for them to read even if they suspect it will unblock them is not "normal".

I do not wish my Amazon delivery driver to show up in my living room.