Hacker News new | ask | show | jobs
by yompal 468 days ago
LLMs do well with outcome-described tools and APIs are written as resource-based atomic actions. By describing an API as a collection of outcomes, LLMs don't need to re-reason each time an action needs to be taken.

Also, when an OpenAPI spec gets sufficiently big, you face a need-in-the-haystack problem https://arxiv.org/abs/2407.01437.

3 comments

This was insightful. The re-reasoning part makes sense. So basically, MCP should be a dumbed down version of your API that accomplishes a few tasks really well. It ha to be a subset of what your API could do because if it wasn't, it either end up being just as generic as your API or the combinatorial explosion of possible use cases would be too large.
Does anyone have any pro tips for large tool collections? (mine are getting fat)

Plan on doing a two layered system mentioned earlier, where the first layer of tool calls is as slim as they can be, then a second layer for more in depth tool documentation.

And/or chunking tools and creating embeddings and also using RAG.

Funnily enough, a search tool to solve this problem was our product going into YC. Now it’s a part of what we do with wild-card.ai and agents.json. I’d love to extend the tool search functionality for all the tools in your belt

It took us a decently long time to get the search quality good. Just a heads up in case you want to implement this yourself

I can agree this is a huge problem with large APIs, we are doing it with twilios api and it’s rough
Thinking from the retrieval perspective, would it make sense to have two layers?

First layer just describes on high level, the tools available and what they do, and make the model pick or route the request (via system prompt, or small model).

Second layer implements the actual function calling or OpenAPI, which then would give the model more details on the params and structures of the request.

That approach does a lot better, but LLMs still have positional bias problem baked into the transformer architecture (https://arxiv.org/html/2406.07791v1). This is where the LLM biases selecting information earlier in the prompt than later, which is unfortunate for tool selection accuracy.

Since 2 steps are required anyways, might as well use a dedicated semantic search for tools like in agents.json.

Interesting. This is the first time I am hearing about intrinsic positional bias for LLM. I had some intuition on this but nothing concrete.