If I run the agent outside and use the MCP, is it the model's responsibility to actually develop in the sandbox or are there deterministic guardrails against performing activities outside of it?
correct. when the sandbox is exposed via MCP, it's up to the model. the agent decides when to call it, and if it has any other way to run code (a shell tool, another MCP server), nothing forces it to go solely through the sandbox.
to actually enforce isolation, the sandbox needs to be the only way the agent can run code. either don't give it any other code-running tools, or the harness exposes sandboxing as a first-class concept. for the latter, anthropic makes this possible with self-hosted execution: https://platform.claude.com/docs/en/managed-agents/self-host...
to actually enforce isolation, the sandbox needs to be the only way the agent can run code. either don't give it any other code-running tools, or the harness exposes sandboxing as a first-class concept. for the latter, anthropic makes this possible with self-hosted execution: https://platform.claude.com/docs/en/managed-agents/self-host...
happy to answer any other questions.