Hacker News new | ask | show | jobs
by 827a 19 days ago
Yes, in the same way a programming language would be worse off if they focused all of their effort on building an implementation instead of a language specification.

You could literally, deterministically, zero AI, code-gen a CLI from an MCP specification, just like you can with an OpenAPI specification. I'm sure tools exist that do this. So if you want a CLI, there it is.

But the problem with a CLI is that it requires a shell environment, and not everywhere you may want to run an agent should or can have access to a shell. MCP enables the harness to tightly control that access. MCP allows the user to easily allowlist/denylist specific tools, or categorize tools into "ask me every time" versus "don't ask me just do it". Doing any of this with a CLI is really hard because CLIs are all very different; yes, AIs can easily learn how to use them, but that might be exactly what you don't want, hey AI don't issue that aws ec2 delete-instance command ah crap there it goes I wish I could have just denylisted its access to that tool.

3 comments

Not having access to the shell is a big hindrance. I have my agent access Gitlab and Jira via CLI tools and in so many cases jq or python is used to manipulate or combine the data into a more useful format, making use of pipes and temporary files. You can of course limit what an agent can do, most easily by not giving it access to things it shouldn't do. I suppose there are no existing easy gateway methods to grant fine-grained OS-level permissions to add such things back, except perhaps `sudo` and similar tools.

MCPs are impossible to combine this way: everything you feed or get from them goes though the model and consumes tokens.

You’re right that having a shell is the ultimate tool, and an agent with a shell seems to perform better than one without one. But, making shells safe is really damn hard; e.g. in the context of running an agent on behalf of a SaaS customer in your AWS environment. For now some companies are accepting the performance/security tradeoff of disabling the shell and focusing on specialized tools.

Remember: jq can always be a tool (MCP or otherwise). In this way you can allowlist specific CLI programs and give them to the agent via tools. Making python a tool is more difficult; that would have all of the same RCE injection issues that the shell would have.

There are isolation stacks that help make “running an agent with a shell on behalf of a customer in the cloud” possible. It’s just very risky. There’s a thousand attack vectors, and to a very real degree companies that are getting to this point are re-thinking their cloud infrastructure and architecture from first principals.

jq cannot be just an MCP, unless it's acceptable that yuo pass all data through the context. If that's not acceptable and you want to have it as tool, then you need some other way to handle the data.

I think the basic solution to this is to have a "static shell" but with modern tools for the agents, not actually executing other binaries. It could have things like jq, curl, piping and redirection to/from session files. Maybe even Python if it can be made safe. If not, then there are a lot of languages can be.

> Maybe even Python if it can be made safe.

https://github.com/pydantic/monty

jq can 100% be an MCP tool. Remember: Agent tools do not have to involve a network boundary. They can be natively implemented inside the agent harness, and/or they can be provided via a local MCP server. The point of making it a tool is to tightly allowlist what the agent is capable of executing; it can only execute jq, not any shell program, and moreover it isn't allowed to do things like redirection, pipes, etc; all it has at its disposal is `jq (filters) (data)`.

People seem to think that MCP exists to give agents more capability. That could not be further from the truth, which is actually the opposite: MCP exists to take capability away from agents. It exists to control them.

Let's say you have a jq MCP. How do you pass data in and out to/from it without the data also being processed as tokens?

That's really my only issue with MCPs.

With shell you can pass data from one component to another directly, not only being cheaper, faster, but also preserving complete integrity. While models nowadays seem to do data echoing well, there's always the chance they might not do it exactly.

There's no reason why a shell would not be able to limit abilities of a party using it as well, by virtue of just implementing only the desired functionality. What makes it more advanced in this context is the (standard) ability to express how to connect multiple components to each other, or to/from local storage. MCP does not have this.

Providing that does not have any inherent danger any more than jq's functions have an inherent danger. Actual execution of processes or real files does not need to be involved.

> Let's say you have a jq MCP. How do you pass data in and out to/from it without the data also being processed as tokens?

Provide a meta-tool which handles piping data in and out of any other tool, and make specific tools which can read/write data sources directly, bypassing context. Or you could go full code mode, but I'm not sure it's worth the lift unless you have Cloudflare numbers of APIs which would need tools.

I work on an internal model/vendor-independent chat app where the agent runs in the browser - every chat gets its own virtual origin private filesystem (OPFS) [1] directory where user attachments get written to and tools can read from/write to, and users can also provide read/write access to a real directory with window.showDirectoryPicker() [2] (both use the same API, so tools can route to/from either).

It can push and pull MBs of data through tools, e.g. pulling huge spreadsheets directly from SharePoint in 50,000 row chunks using a tool which calls the Excel Services REST API, passing those all into a code execution tool to join them together and process them, which generates an Excel output file using SheetJS, none of which goes into context.

People used to drag their multi-MB documents in and complain either it didn't work or the agent couldn't do anything useful with it. Now it just works.

[1] https://developer.mozilla.org/en-US/docs/Web/API/File_System...

[2] https://developer.mozilla.org/en-US/docs/Web/API/Window/show...

This is the first sensible thing I’ve read in defense of MCP.

However, wouldn’t a shell where all permissions are off by default, then you can enable read and write privileges to certain files and directories and executable privileges to certain binaries accomplish the same thing using UNIX permissions? Isn’t this still reinventing the wheel?

Can an MCP provide prompts for your model to download and use CLIs (and ensure they have the right versions of those tools) in such a way that the data flows through the client side tools?

The more I read this thread the more I'm convinced that the main value of MCP is to provide a server managed release process. This is the same advantage that SaaS has over client side apps.

However MCPs couples with a client willing to run tools locally can provide the best of both worlds

As far as I know, the only way an MCP can provide you data that doesn't go into the context is by providing URLs to the data, and then the model uses e.g. curl to access that data for data manipulation purposes. You could also return result set ids and provide accessors to such data, but ultimately you'd need to provide powerful accessors to that result set to avoid polluting context. Things like shell with all its power already provides.

It seems like there's little point in MCP in that case. Maybe more point if it was a standard mechanism for MCP to provide additional data, in a completely compatible fashion with all other tools? You could perhaps even pass such URLs to other MCPs. You could have an MCP for jq for doing stream processing. Starts to look a lot like a shell, though.

Seems like MCPs could also be extended to store auxiliary data to your memory (or filesystem..), and then an additional extension to provide that kind of data as auxiliary data in the calls to MCP.

Well, even as is, MCP still provides a standard method of using OAuth for accessing such services. And you must use MCP if you wish to add something to the ChatGPT.com web service, so it's easy to see why OpenAI folks are seeing companies going that way.

>to manipulate or combine the data into a more useful format

why not build this directly into MCPs?

Hmm, indeed, so maybe I could have all this as an MCP, so I can just easily pass any imaginable data manipulation inside it, and then also have it support calling other MCPs, all inside that one MCP, to avoid filling context with intermediate data..

Sounds a lot like a shell to me.

Go idea. We will call this new MCP “bash”. It will allow you to stream the output of one command to the input of another incrementally as the data is generated.
You prevent the LLM from deleting your instances by not granting its AWS user that permission. Whatever tool you let it use to talk to AWS is irrelevant.
So the permissions model h is a the main advantage MCP has over CLIs?
Is that so surprising? I thought that was a given. And as soon as remote resources are involved, the old "it's in a docker" peace of mind does not apply.