|
|
|
|
|
by danenania
404 days ago
|
|
Thanks, I'll check it out. I'm working on a coding agent, and MCP has been a frequently requested feature, but yeah this issue has been my main hesitation. Getting even basic prompts that are designed to do one or two things to work reliably requires so much testing and iteration that I'm inherently pretty skeptical that "here are 10 community-contributed MCPs—choose the right one for the task" will have any hope of working reliably. Of course the benefits if it would work are very clear, so I'm keeping a close watch on it. Evals seem like a key piece of the puzzle, though you still might end up in combinatorial explosion territory by trying to test all the potential interactions with multiple MCPs. I could also see it getting very expensive to test this way. |
|
But agree that even basic prompts can be a struggle. You often need to name the tool in the prompt to get things to work reliably, but that's an awful user experience. Tool call descriptions play a pretty vital role, but most MCP servers are severely lacking in this regard.
I hope this a result of everything being so new and the tooling and models will evolve to solve these issues over time.