| The valuable lesson from what Cloudflare claims is that if you want to make an LLM perform as you expect you have to build considering their strengths and weaknesses. You can see the same behavior if you try to ask an LLM to code in an API that is not commonly used. When it comes to MCP tooling I followed a different path but with similar assumptions. There are tools that LLMs have been Rled to death to use. So I’m modeling my tools after them. Specifically, I try to have a “glob” tool, used to let the LLM figure out structure. A search and a read tool and use regexp as much as possible for passing parameters. You can see an early version of this pattern here: https://github.com/typedef-ai/fenic/blob/main/examples/mcp/d... It has been working well, at least in terms of the model knowing how to invoke and use the tools. I have to say though that each model is different. I see differences between Claude code and Codex when I use the MCP for development, at least on how good they are in retrieving the information they need. Maybe I should try to run some benchmarking and compare more formally |