Hacker News new | ask | show | jobs
by 827a 12 days ago
You’re right that having a shell is the ultimate tool, and an agent with a shell seems to perform better than one without one. But, making shells safe is really damn hard; e.g. in the context of running an agent on behalf of a SaaS customer in your AWS environment. For now some companies are accepting the performance/security tradeoff of disabling the shell and focusing on specialized tools.

Remember: jq can always be a tool (MCP or otherwise). In this way you can allowlist specific CLI programs and give them to the agent via tools. Making python a tool is more difficult; that would have all of the same RCE injection issues that the shell would have.

There are isolation stacks that help make “running an agent with a shell on behalf of a customer in the cloud” possible. It’s just very risky. There’s a thousand attack vectors, and to a very real degree companies that are getting to this point are re-thinking their cloud infrastructure and architecture from first principals.

1 comments

jq cannot be just an MCP, unless it's acceptable that yuo pass all data through the context. If that's not acceptable and you want to have it as tool, then you need some other way to handle the data.

I think the basic solution to this is to have a "static shell" but with modern tools for the agents, not actually executing other binaries. It could have things like jq, curl, piping and redirection to/from session files. Maybe even Python if it can be made safe. If not, then there are a lot of languages can be.

> Maybe even Python if it can be made safe.

https://github.com/pydantic/monty

jq can 100% be an MCP tool. Remember: Agent tools do not have to involve a network boundary. They can be natively implemented inside the agent harness, and/or they can be provided via a local MCP server. The point of making it a tool is to tightly allowlist what the agent is capable of executing; it can only execute jq, not any shell program, and moreover it isn't allowed to do things like redirection, pipes, etc; all it has at its disposal is `jq (filters) (data)`.

People seem to think that MCP exists to give agents more capability. That could not be further from the truth, which is actually the opposite: MCP exists to take capability away from agents. It exists to control them.

Let's say you have a jq MCP. How do you pass data in and out to/from it without the data also being processed as tokens?

That's really my only issue with MCPs.

With shell you can pass data from one component to another directly, not only being cheaper, faster, but also preserving complete integrity. While models nowadays seem to do data echoing well, there's always the chance they might not do it exactly.

There's no reason why a shell would not be able to limit abilities of a party using it as well, by virtue of just implementing only the desired functionality. What makes it more advanced in this context is the (standard) ability to express how to connect multiple components to each other, or to/from local storage. MCP does not have this.

Providing that does not have any inherent danger any more than jq's functions have an inherent danger. Actual execution of processes or real files does not need to be involved.

> Let's say you have a jq MCP. How do you pass data in and out to/from it without the data also being processed as tokens?

Provide a meta-tool which handles piping data in and out of any other tool, and make specific tools which can read/write data sources directly, bypassing context. Or you could go full code mode, but I'm not sure it's worth the lift unless you have Cloudflare numbers of APIs which would need tools.

I work on an internal model/vendor-independent chat app where the agent runs in the browser - every chat gets its own virtual origin private filesystem (OPFS) [1] directory where user attachments get written to and tools can read from/write to, and users can also provide read/write access to a real directory with window.showDirectoryPicker() [2] (both use the same API, so tools can route to/from either).

It can push and pull MBs of data through tools, e.g. pulling huge spreadsheets directly from SharePoint in 50,000 row chunks using a tool which calls the Excel Services REST API, passing those all into a code execution tool to join them together and process them, which generates an Excel output file using SheetJS, none of which goes into context.

People used to drag their multi-MB documents in and complain either it didn't work or the agent couldn't do anything useful with it. Now it just works.

[1] https://developer.mozilla.org/en-US/docs/Web/API/File_System...

[2] https://developer.mozilla.org/en-US/docs/Web/API/Window/show...

So are you using MCP to do this?

I'm not saying MCP or the ways we use it cannot be extended to cover this use case, but my understanding is that nobody does it. But shell/code does, and more.

This is the first sensible thing I’ve read in defense of MCP.

However, wouldn’t a shell where all permissions are off by default, then you can enable read and write privileges to certain files and directories and executable privileges to certain binaries accomplish the same thing using UNIX permissions? Isn’t this still reinventing the wheel?