|
|
|
|
|
by buremba
117 days ago
|
|
My take is that agents should only take actions that you can recover from by default. You can gradually give it more permission and build guardrails such as extra LLM auditing, time boxed whitelisted domains etc. That's what I'm experimenting with https://github.com/lobu-ai/lobu 1. Don't let it send emails from your personal account, only let it draft email and share the link with you. 2. Use incremental snapshots and if agent bricks itself (often does with Openclaw if you give it access to change config) just do /revert to last snapshot. I use VolumeSnapshot for lobu.ai. 3. Don't let your agents see any secret. Swap the placeholder secrets at your gateway and put human in the loop for secrets you care about. 4. Don't let your agents have outbound network directly. It should only talk to your proxy which has strict whitelisted domains. There will be cases the agent needs to talk to different domains and I use time-box limits. (Only allow certain domains for current session 5 minutes and at the end of the session look up all the URLs it accessed.) You can also use tool hooks to audit the calls with LLM to make sure that's not triggered via a prompt injection attack. Last but last least, use proper VMs like Kata Containers and Firecrackers. Not just Docker containers in production. |
|
One problem I'm finding discussion about automation or semi-automation in this space is that there's many different use cases for many different people: a software developer deploying an agent in production vs an economist using Claude Vs a scientist throwing a swarm to deal with common ML exploratory tasks.
Many of the recommendations will feel too much or too little complexity for what people need and the fundamentals get lost: intent for design, control, the ability to collaborate if necessary, fast iteration due to an easy feedback loop.
AI Evals, sandboxing, observability seem like 3 key pillars to maintain intent in automation but how to help these different audiences be safely productive while fast and speak the same language when they need to product build together is what is mostly occupying my thoughts (and practical tests).