| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fennecfoxy 315 days ago

In my opinion, I get the desire to create some sort of specification for an LLM to interface with [everything else], but I don't really see the point at doing it on an inference level by smashing JSON into the context.

These models are usually very decent at parsing out stuff like that anyway; we don't need the MCP spec, everyone can just specify the available tools in natural language and then we can expect large param models to just "figure it out".

If MCP had been a specification for _training_ models to support tool use on an architectural level, not just training it to ask to use a tool with a special token as they do now.

It's an interesting topic because it's the exact same as the boundary between humans (sloppy, organic, analog messes) and traditional programs (rigid types, structures, formats).

To be fair if we can build tool use in architecturally and solve the boundary between these two areas then it also works for things like objective facts. LLMs are just statistical machines and data in the context doesn't really mean all that much, we just hope it is statistically relevant given some input and it is often enough that it works, but not guaranteed.

2 comments

dragonwriter 315 days ago

> These models are usually very decent at parsing out stuff like that anyway; we don't need the MCP spec, everyone can just specify the available tools in natural language and then we can expect large param models to just "figure it out".

This is mostly the kind of misunderstanding of MCP that the article seems directed at, and much of this response is focussed on things that are key points in the article, but:

MCP isn't for the models, it is for the toolchains supporting them. The information models actually need about tools and resources is accessed from the server by the toolchain using the information that is in the MCP, and the structure that models use varies by the model, but it is consistently completely different information than what is in the MCP—the tool and resource (but probably not prompt) names from the MCP will probably also be given to the model, but that's pretty much the only direct overlap. MCP can also define prompts for the toolchain, but information about those are more likely presented directly to the user than the model itself.

The toolchain also needs to know how the model is trained to get tool information in its prompt, just like it needs to know other aspects of the models preeferred prompt template, but that is a separate concern from MCP.

> If MCP had been a specification for _training_ models to support tool use on an architectural level, not just training it to ask to use a tool with a special token as they do now.

MCP isn't a specification for training anything. MCP is a specification for providing information about tools external to the toolchain running the LLM to the toolchain. Tools internal to the toolchain don't ever use MCP because, again, MCP isn't for the model, it's for the toolchain.

link

fennecfoxy 311 days ago

You've replied multiple times specifying toolchains without explaining what they are.

I've seen for models that don't support tool defs via API that those tool defs are provided in the context (though the model is still trained for tool use, outputting the special python_call/x tokens to indicate a tool call in output).

I can see for example that MCP's own example using Anthropic uses their API/SDKs tools section as outlined here https://docs.anthropic.com/en/api/messages#body-tools. What the example does is shove the tool definition into here - this includes the full name description etc of the tool.

Quoting them "And then asked the model "What's the S&P 500 at today?", the model might produce tool_use content blocks in the response" so I imagine that behind the scenes they're _smashing it into the context_ as I already suggested; the only reason it's separate in the API is so they can type/validate it.

I don't know what this magical tool chain is but the LLM is the thing providing output based on the not so new magical concept of attention and statistics; I don't see how some separate "toolchain" piece takes the input string and somehow does a better job at selecting a tool than the model itself; unless the toolchain is itself a smaller LLM trained specifically for tool use outside of your larger multi-purpose/"knowledgable" LLM.

link

zozbot234 315 days ago

As I mentioned in a sibling thread, you can use that JSON structured input to constrain the LLM's output during inference so that it will only contain valid tool calls, in addition to smashing it into the context. This is valuable since it's going to be far more robust than assuming that the LLM can "figure everything out" from a natural language description.

link