Hacker News new | ask | show | jobs
by ondrsh 474 days ago
It's much simpler: MCP allows tools to be added at runtime instead of design-time. That's it. And because this can happen at runtime, the user (NOT the developer) can add arbitrary functionality to the LLM application (while the application is running — hence, runtime). One could make the argument that LLM applications with MCP support are conceptually similar to browsers — both let users connect to arbitrary MCP/HTTP servers at runtime.

But the comparison with HTTP is not a very good one, because MCP is stateful and complex. MCP is actually much more similar to FTP than it is to HTTP.

I wrote 2 short blog posts about this in case anyone is curious: https://www.ondr.sh/blog/thoughts-on-mcp

4 comments

The spec and server docs also contain a helpful explanation:

https://spec.modelcontextprotocol.io/specification/2024-11-0...

https://modelcontextprotocol.io/sdk/java/mcp-server

Also, btw, how long until people rediscover HATEOAS, something which inherently relies on a generalised artificial intelligence to be useful in the first place?

Exactly. An AI-web based on the principles of HATEOAS is the next step, where instead of links, we would have function calls.

As you said, HATEOAS requires a generic client that can understand anything at runtime — a client with general intelligence. Until recently, humans were the only ones fulfilling that requirement. And because we suck at reading JSON, HATEOAS had to use HTML. Now that we have strong AI, we can drop the Hypermedia from 'H'ATEOAS and use JSON instead.

I wrote about that exact thing in Part 2: https://www.ondr.sh/blog/ai-web

Both blog posts were excellent. Thanks for the breakdown.

I’m bullish on MCP-what is are some non-obvious things I shod consider that might dampen my fire?

TL;DR: IMHO, the MCP enforces too much structure, which makes it vulnerable to disruption by less structured protocols that can evolve according to user needs.

The key reason the web won out over Gopher and similar protocols was that the early web was stupidly simple. It had virtually no structure. In fact, the web might have been the greatest MVP of all time: it handed server developers a blank canvas with as few rules as possible, leading to huge variance in outputs. Early websites differed far more from each other than, for example, Gopher sites, which had strict rules on how they had to work and look.

Yet in a server-client "ping-pong" system, higher variance almost always wins. Why? Because clients consume more of what they like and less of what they don't. This creates an evolutionary selection process: bad ideas die off, and good ideas propagate. Developers naturally seem to develop what people want, but they are not doing so by deliberate choice — the evolutionary process makes it appear so.

The key insight is that the effectiveness of this process stems from a lack of structure. A lack of structure leads to high variance, which lets the protocol escape local minima and evolve according to user needs.

The bear case for MCP is that it's going the exact opposite route. It comes with tons of features, each adding layers of abstractions and structure. While that might work in narrowly understood fields, it's much harder to pull off in novel domains where user preferences aren't clear — knowing what users want is hard. The MCP's rigid structure inherently limits variance in server styles (a trend already observable IMHO), making MCP vulnerable to competition by newer, less structured protocols — similar to how the web steamrolled Gopher, even though the latter initially seemed too far ahead to catch. The fact that almost all MCP servers are self-contained (they don't link to other MCP servers) further means the current lead is not as effective, as the lock-in effect is weaker.

Thanks again for the thorough response.
Under this thesis, then SLOP would win, except I don’t yet see how it can be composed by the user, which MCP is supposed to have moved the composability into?

https://i-love-slop.com/

Seems nice because it's stateless and thus simpler. But it still enforces lots of structure (static entry points, memory, etc.). So if MCP reminds me of FTP/Telnet (bi-directional, stateful), SLOP reminds me of Gopher.

In any case, protocols need killer applications to take off — for the web this killer app was Mosaic. Right now I don't see any application supporting SLOP. If they are able to come up with one that outperforms other MCP-based LLM applications, they will have a chance.

My personal belief is that the winning protocol will be web-like. Right now there is no such protocol. Maybe I'm wrong, let's see.

Yeah, maybe it's because I spent too much time working on another open standard (otel), but this seems pretty obvious (and much simpler -- for now).

MCP standardizes how LLMs can call tools at runtime, and how tools can call LLMs at runtime. It's great!

It sounds like pushing the logic of API calling into one of the many "mcp servers", with the user still needing to go through the manual step of creating accounts on third party services, generating a bunch of different tokens, and dealing with them all.

In essence it seems like an additional shim that removes all the security of API tokens while still leaving the user to deal with them.

Side note, has Tron taught us nothing about avoiding AI MCPs?

Yes, although this is not a consumer play. This is an enterprise play. At my workplace, I'm already signed in to my document portal, debugging tools, slack, and other tools for my work through Okta SSO. I imagine some future agent I use to sift through various things will have similar access privileges.
Hey ondrsh, I read your blog post and thought it was very interesting, however I did have a follow-up question:

In your post you say "The key insight is: Because this can happen at runtime, the user (NOT the developer) can add arbitrary functionality to the application (while the application is running — hence, runtime). And because this also works remotely, it could finally enable standardized b2ai software!"

That makes sense, but my question is: how would the user actually do that? As far as I understand, they would have to somehow pass in either a script to spin up their own server locally (unlikely for your everyday user), or a url to access some live MCP server. This means that the host they are using needs an input on the frontend specifically for this, where the user can input a url for the service they want their LLM to be able to talk to. This then gets passed to the client, the client calls the server, the server returns the list of available tools, and the client passes those tools to the LLM to be used.

This is very cool and all, but it just seems like anyone who has minimal tech skills would not have the patience to go and find the MCP server url of their favourite app and then paste it into their chatbot or whatever they're using.

Let me know if I have misunderstood anything, and thanks in advance!

Your understanding is on point.

> As far as I understand, they would have to somehow pass in either a script to spin up their own server locally (unlikely for your everyday user), or a url to access some live MCP server. This means that the host they are using needs an input on the frontend specifically for this, where the user can input a url for the service they want their LLM to be able to talk to. This then gets passed to the client, the client calls the server, the server returns the list of available tools, and the client passes those tools to the LLM to be used.

This is precisely how it would work. Currently, I'm not sure how many host applications (if any) actually feature a URL input field to add remote servers, since most servers are local-only for now. This situation might change once authentication is introduced in the next protocol version. However, as you pointed out, even if such a URL field existed, the discovery problem remains.

But discovery should be an easy fix, in my opinion. Crawlers or registries (think Google for web or Archie for FTP) will likely emerge, so host applications could integrate these external registries and provide simple one-click installs. Apparently, Anthropic is already working on a registry API to simplify exactly this process. Ideally, host applications would automatically detect when helpful tools are available for a given task and prompt users to enable them.

The problem with local-only servers is that they're hard to distribute (just as local HTTP servers are) and that sandboxing is an issue. One workaround is using WASM for server development, which is what mcp.run is doing (https://docs.mcp.run/mcp-clients/intro), but of course this breaks the seamless compatibility.

Amazing, that makes a lot of sense. The idea of having one-click installs is very cool. I still think for the every day consumer it might be a small roadblock that they still have to know what tools to use before being able to use them, and having that tool suggestion mechanism you mentioned would really bring everything together.

Thanks for the awesome feedback, and congrats on the blog posts by the way, they are a great read!

I guess if someone like Anthropic builds a proper registry, then the user wouldn't have to decide and the AI can decide itself?
What does it actually offer over OpenAPI though? If I feed an openapi spec to an LLM it can use it as a tool
It seems like you're describing a scenario where you know at design-time which tools will be included. In that case the benefit of using MCP is less clear.

While you usually get tools that work out of the box with MCP (and thus avoid the hassle of prompting + testing to get working tool code), integrating external APIs manually often results in higher accuracy and performance, as you're not limited by the abstractions imposed by MCP.

any API can be modeled as JSON in, JSON out, which you can pass to the system prompt at design time or at runtime, no?
I'm not sure I fully understand your scenario. Who will be doing the actual network requests?

MCP is basically a trifecta of:

  1) MCP-aware LLM applications
  2) MCP clients
  3) MCP servers
The LLM application is key here. It is doing all the "plumbing", like spawning MCP clients to connect to MCP servers — similar to how your web browser is spawning HTTP clients to connect to HTTP servers. The LLM application thus initiates and receives the actual requests between MCP client and MCP server, manages MCP client/server pairs, injects tool results into the LLM context et cetera. This means the LLM application must be MCP-aware at design-time. But because all of this plumbing can then happen at runtime under the hood, the user (who adds MCP tools while the application is running) does not need to be a developer.

As a developer, MCP allows you to write:

  1) MCP-aware LLM applications
  2) MCP servers
MCP-aware LLM applications (like Claude Desktop or Cursor) let their users add arbitrary functionality (i.e. other MCP servers) at runtime.

MCP servers can be added by users of MCP-aware LLM applications at runtime.

Both evolve around the concept of giving non-developers a way to add functionality at runtime. Most developers are confused about MCP because they don't need to do neither 1) nor 2), instead they themselves add tools to the applications they write (at design-time) and then ship it.