| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cyanydeez 26 days ago

the models will never avoid egregious behavior. think of it like every "good intentions" morality tale. theres almost always some geniune context where that behavior is wanted.

instead, the coding harness or determinative tool, will need hardcoded security features.

in opencode, almost all the power comes from bash and all other permissions are just chrades. its powerful and insecure because of it.

you can sand box them but then you fight the sandbox to pipe in your assets. the sandbox becomes porous because elsewise its useless.

MCPs dont address much either.

want we are looking for is a portal or protocol that has the model and harness and the actions tunneled, like ssh, to some fixed scoped and limited shell along side the assets.

then, the user and LLM can the negotiate assets and actions as needed via the protocol.

but alas, as your comment suggests, people thing theres some perfect context thatll prevent bad things from happening. the libertarian paradise without regulation.

2 comments

madamelic 23 days ago

> we are looking for is a portal or protocol that has the model and harness and the actions tunneled, like ssh, to some fixed scoped and limited shell along side the assets then, the user and LLM can the negotiate assets and actions as needed via the protocol.

Take a look at a project I just finished this weekend: https://clawband.io

It's an agent permissioning platform that isolates your service connections and puts a granular permissioning layer on it. So rather than your agent getting full access to a service, they get a Clawband key that can be used to request actions then Clawband checks the parameters to see if it is allowed.

The classical example I have made is allowing your agent access to privacy.com. You may want it to be able to list your cards but not create one or you may want to allow creating cards but only a certain limit.

The plan is to make it open-source and allow self-hosting because security / sanity of users but still have a SaaS offering as a demo / ease of use.

madrox 26 days ago

I think you're choosing to ignore what I said about the implication of durable workflows, because you seem to be inventing some stories about my comment.

I find that well documented plans do pretty well at aligning AI to what I want it to do, and if it does go astray, as you rightly point out it can still do, it would be sufficient if I can undo it with little pain. We do this kind of thing all the time in CI/CD pipelines.

Even humans can take down production. We have all kinds of guards in place to empower while also defending against the intern accidentally dropping the DB.