|
|
|
|
|
by fennecfoxy
315 days ago
|
|
In my opinion, I get the desire to create some sort of specification for an LLM to interface with [everything else], but I don't really see the point at doing it on an inference level by smashing JSON into the context. These models are usually very decent at parsing out stuff like that anyway; we don't need the MCP spec, everyone can just specify the available tools in natural language and then we can expect large param models to just "figure it out". If MCP had been a specification for _training_ models to support tool use on an architectural level, not just training it to ask to use a tool with a special token as they do now. It's an interesting topic because it's the exact same as the boundary between humans (sloppy, organic, analog messes) and traditional programs (rigid types, structures, formats). To be fair if we can build tool use in architecturally and solve the boundary between these two areas then it also works for things like objective facts. LLMs are just statistical machines and data in the context doesn't really mean all that much, we just hope it is statistically relevant given some input and it is often enough that it works, but not guaranteed. |
|
This is mostly the kind of misunderstanding of MCP that the article seems directed at, and much of this response is focussed on things that are key points in the article, but:
MCP isn't for the models, it is for the toolchains supporting them. The information models actually need about tools and resources is accessed from the server by the toolchain using the information that is in the MCP, and the structure that models use varies by the model, but it is consistently completely different information than what is in the MCP—the tool and resource (but probably not prompt) names from the MCP will probably also be given to the model, but that's pretty much the only direct overlap. MCP can also define prompts for the toolchain, but information about those are more likely presented directly to the user than the model itself.
The toolchain also needs to know how the model is trained to get tool information in its prompt, just like it needs to know other aspects of the models preeferred prompt template, but that is a separate concern from MCP.
> If MCP had been a specification for _training_ models to support tool use on an architectural level, not just training it to ask to use a tool with a special token as they do now.
MCP isn't a specification for training anything. MCP is a specification for providing information about tools external to the toolchain running the LLM to the toolchain. Tools internal to the toolchain don't ever use MCP because, again, MCP isn't for the model, it's for the toolchain.