| Are you two talking at cross-purposes because you don't have a shared understanding of control and data flow? The pieces here are: * Claude Code, a Node (Javascript) application that talks to MCP server(s) and the Claude API * The MCP server, which exposes some tools through stdin or HTTP * The Claude API, which is more structured than "text in, text out". * The Claude LLM behind the API, which generates a response to a given prompt Claude Code is a Node application. CC is configured in JSON with a list of MCP servers. When CC starts up, CC"s Javascript initialises each server and as part of that gets a list of callable functions. When CC calls the LLM API with a user's request, it's not just "here is the user's words, do it". There are multiple slots in the request object, one of which is a "tools" block, a list of the tools that can be called. Inside the API, I imagine this is packaged into a prefix context string like "you have access to the following tools: tool(args) ...". The LLM API probably has a bunch of prompts it runs through (figure out what type of request the user has made, maybe using different prompts to make different types of plan, etc.) and somewhere along the way the LLM might respond with a request to call a tool. The LLM API call then returns the tool call request to CC, in a structured "tool_use" block separate from the freetext "hey good news, you asked a question and got this response". The structured block means "the LLM wants to call this tool." CC's JS then calls the server with the tool request and gets the response. It validates the response (e.g., JSON schemas) and then calls the LLM API again bundling up the success/failure of the tool call into a structured "tool_result" block. If it validated and was successful, the LLM gets to see the MCP server's response. If it failed to validate, the LLM gets to see that it failed and what the error message was (so the LLM can try again in a different way). The idea is that if a tool call is supposed to return a CarMakeModel string ("Toyota Tercel") and instead returns an int (42), JSON Schemas can catch this. The client validates the server's response against the schema, and calls the LLM API with {
"type": "tool_result",
"tool_use_id": "abc123",
"is_error": true,
"content": [
{
"type": "text",
"text": "Expected string, got integer."
}
]
}
So the LLM isn't choosing to call the validator, it's the deterministic Javascript that is Claude Code that chooses to call the validator.There are plenty of ways for this to go wrong: the client (Claude Code) has to validate; int vs string isn't the same as "is a valid timestamp/CarMakeModel/etc"; if you helpfully put the thing that failed into the error message ("Expect string, got integer (42)") then the LLM gets 42 and might choose to interpret that as a CarMakeModel if it's having a particularly bad day; the LLM might say "well, that didn't work, but let's assume the answer was Toyota Tercel, a common car make and model", ... We're reaching here, yet these are possible. But the basic flow has validation done in deterministic code and hiding the MCP server's invalid responses from the LLM. The LLM can't choose not to validate. You seemed to be saying that the LLM could choose not to validate, and your interlocutor was saying that was not the case. I hope this helps! |
No they're literally just skipping an entire step into how LLM's actually "use" MCP.
MCP is just a standard, largely for humans. LLM's do not give a singular fuck about it. Some might be fine tuned for it to decrease erroneous output, but at the end of the day it's just system prompts.
And respectfully, your example misunderstands what is going on:
>* The Claude API, which is more structured than "text in, text out".
>* The Claude LLM behind the API, which generates a response to a given prompt
No. That's not what "this" is. LLM's use MCP to discover tools they can call, aka function/tool calling. MCP is just an agreed upon format, it doesn't do anything magical; it's just a way of aligning the structure across companies, teams, and people.
There is not an "LLM behind the API", while a specific tool might implement its overall feature set using LLM's, that's totally irrelevant to what's being discussed and the principle point of contention.
Which is this: an LLM interacting with other tools via MCP still needs system prompts or fine tuning to do so. Both of those things are not predictable or deterministic. They will fail at some point in the future. That is indisputable. It is a matter of statistical certainty.
It's not up for debate. And an agreed upon standard between humans that ultimately just acts as convention is not going to change that.
It is GRAVELY concerning that so many people are trying to use technical jargon of which they clearly are ill-equipped to do so. The magic rules all.