Hacker News new | ask | show | jobs
by saurik 472 days ago
1) Ok, so you are reinventing SOAP or WSDL or whatever... did that ever go well? How and why is this different from every prior attempt to create the one true API layer?

2) Is this meaningfully different from just having every API provide a JavaScript SDK to access it, and then having the model write code? That's how humans solve this stuff.

3) If the AI is actually as smart at doing tasks like writing clients for APIs as people like to claim, why does it need this to be made machine readable in the first place?

5 comments

1) Valid point, this could haven been wsdl/swagger. But the MCP spec supports spinning up local applications and communicate via stdio which open api cannot do.

2 + 3) having a few commands that AI knows it should call and confidently so without security concern, is better than just give AI permision to do every thing under the sun and tell it to code a program doing so.

The prompt for the later is also much more complex and does not work as predictably.

Question three is what hits the nail on the head about how this “AI revolution” isn’t as robust as often claimed.

If it was truly intelligent it could reason about things like API specifications without any precursors or shared structure, but it can’t.

Are LLMs powerful? Yes. Is current “AI” simply a re-brand of machine learning? IMO, also yes

> If it was truly intelligent it could reason about things like API specifications without any precursors or shared structure, but it can’t

I can reason about any API or specification. But when I'm trying to get a different, compound, and higher-level task done, its quite a bit faster and less distracting if I can rely on someone else to have already distilled what I need (into a library, cheat-sheet, tutorial, etc).

Similarly, I've seen LLMs do things like generate clients and scripts for interacting with APIs. But its a lot easier to just hand them one ready to go.

It doesn’t negate my point; the technology can’t self reason any API specification, and if it could this wouldn’t be needed because while humans benefit from this simplification why would a machine that can think 10000x faster than a human can?
My impression, and perhaps this is wildly off, is that MCP could be useful to whitelist safe usage of tools by LLMs.

I say this out loud so someone can correct me if I’m mistaken!

Then it's a useless concept, because people who use LLMs don't want to be bounded by a whitelist.
Strong disagree. I want absolutely control over what tools my agent can access on my computer.
Do you want your tech landlord to have absolute control over what tools your agent can use on your computer?
As I understand it, I maintain the whitelist, not the tech overlord.

That’s sort of the point of MCP, as near as I can tell.

Exactly Any junior developer can reason about API and integrate

But LLm will replace them?

I wouldn't call it another form of API. It's more like an SDK. If you were accessing a REST API from Android, iOS, Windows, Mac, Firefox, they'd be mostly the same. But an SDK for Android and an SDK for iOS has been built for the platform. Often the SDK encapsulates the official API.

That's a direct answer for (2) too - instead of writing a JS SDK or Swift SDK or whatever, it's an AI SDK and shared across Claude, OpenAI, Groq, and so on.

(3) is exactly related to this. The AI has been trained to run MCPs, viewing them as big labeled buttons in their "mind".

I think you got the questions spot on and the answers right there as well.

I didn't have a good term so I went with "API layer" (not merely "API"), but, to try to clarify... that's what you also get with SOAP/WSDL or any of the other numerous attempts over the years to build an API "layer" thing: you can use the one universal SDK you have, plus only the schema / IDL, to use the API. Every time people try to describe MCP it just sounds like yet another API description language when we already have a giant drawer of those that never really worked out, including OpenA"P"I (lol ;P).

https://www.openapis.org/

Regardless, again: if the AI is so smart, and it somehow needs something akin to MCP as input (which seems silly), then we can use the AI to take, as input, the human readable documentation -- which is what we claim these AIs can read and understand -- and just have it output something akin to MCP. The entire point of having an AI agent is that it is able to do things similar to a software developer, and interfacing with a random API is probably the most trivial task you can possible do.

MCP is more like a UI that is optimized for LLMs for interacting with a tool or data source. I'd argue that an API is not a user interface and that's not really their intention.

> Regardless, again: if the AI is so smart, and it somehow needs something akin to MCP as input (which seems silly), then we can use the AI to take, as input, the human readable documentation -- which is what we claim these AIs can read and understand -- and just have it output something akin to MCP.

This example is like telling someone who just wants to check their email to build an IMAP client. It's an unnecessary and expensive distraction from whatever goal they are actually trying to accomplish.

As others have said, models are now being trained on MCP interactions. It's analogous to having shared UI/UX patterns across different webapps. The result is we humans don't have to think as hard to understand how to use a new tool because of the familiar visual and interaction patterns. As the design book title says, 'don't make me think.'

> I'd argue that an API is not a user interface and that's not really their intention.

API is a user interface for other developers – just like MCP is a UI for LLMs.

The U in UI is User, and refers to the human. If something is Interfacing with an Application, and that something is another Application instead of a human, that means the interaction is happening via an Application Interface (to make things easier, we'll call it an AI for short...just kidding).

That seems to be what happens here with MCP: it is a way for an Application (the LLM) to derive programming by Interfacing with another Application (the 3rd party API provider, for example).

That would make MCP an API for accessing other APIs. Not that that's bad, computers are layers of abstraction all the way down. At the same time though, we already have some of those. Perhaps some sort of OpenAPI bridge would be useful in the same manner and not require rewriting API specs, but that probably exists, too.

Who am I kidding, though? The AI assistants/agents are going to be writing whatever manifests are necessary to run more AI, so it'll be a negligible increase in effort to do both.

> If something is Interfacing with an Application, and that something is another Application instead of a human, that means the interaction is happening via an Application Interface

My point is, the applications have been (until recently) predominantly written by humans. API is the interface developers use through the code they write. Just like a UI can be better or worse, so can API: it might be concise, expressive, consistent – or verbose, clunky and completely unpredictable. Just like in UI you don’t want to click through dozens of submenus, in API you don’t want to make a dozen of calls to do something simple. It’s way more similar than you think!

Now where MCP fits in here is a whole other question...

Why not JSON/XML?

{"action": "create_directory", "value": "foobar/assets/"} is 15 tokens whereas create_directory("foobar/assets/") is 7 tokens. It's not the exact format, but you get the idea.

It's not just about cost, higher tokens also result in lower performance. It's as hard for the LLM to read this as it is for you to read it.

I did some experiments with protocols last year. YAML was the most efficient one by a large margin, and yet it often made mistakes. Half the output layer code is dedicated to fixing common mistakes in formatting, like when it forgets to put in a dash or merges one parameter with another. We had 2/3 of the input prompt dedicated to explaining the spec and giving examples. It's definitely not trivial.

MCP is pre-trained into the models, no need for all this.

The work we had it on did not need a good model. We had to use a more expensive model and most open source/self-trained ones didn't do the trick. We ended up taking a 3x more expensive model. Also don't look at it as LLMs being smart enough to do it; we also want something for the dumb & cheap micro LLMs as well, and micro LLMs will likely be doing agentic work.

It's also as likely to make mistakes as a human - LLMs didn't output JSON until mid 2024. Gemini was one of the first to officially feature JSON output and it was still randomly breaking by Sept 2024 with JSON arrays, even when giving the API a properly detailed spec to respond in.

They can improve it, but they have to train it on something and they might as well make something up that's more efficient. OpenAI might do one too. Even with images we see newer protocols like HEIC, WEBP when PNG works fine. I expect MCP will live because it's particularly suited to this use case.

No, an API is the closest term, as calling MCP, which is a simple protocol, an SDK is literally wrong.

A protocol is not a software development kit.

I am not even the person who said "I wouldn't call it another form of API"... can you maybe just stick to the concrete and explain how this is different from SOAP/WSDL or OpenAPI/Swaggar? I honestly don't even know what short term either I or you would now use for any of them, but I feel confident that the overall premise described by this article doesn't differentiate: they offer a standardized connector to various tools and data sources behind otherwise different APIs, even offering--through a registry such as UDDI--dynamic discovery of these resources. The only not-really-a-difference I feel is that we are explicitly describing wrapping the other APIs into this protocol... but like, that's what you'd of course have to do or expose an existing API via SOAP (which is also a capital-P-stands-for-Protocol).
Yup. Though it's a JSON API and not an XML API like SOAP. As for how it differs form the old OpenAPI/Swaggar... shrug... not much... its got a lot less flexibility to it and it has specific tooling for agentic tool use.

https://github.com/modelcontextprotocol/specification/blob/m...

MCP tends to be much simpler, less powerful than an API that you’d actually try to develop against. The LLMs need the most simplified access patterns possible
Most people hate SOAP and WSDL. You can argue most web APIs are reinventing them in the sense that you could reimplement them with WSDL, to get worse versions of them.