|
When at night the pager goes off, I ask Claude: "what is alerting, what changed in the last hour?". Claude answers by chaining calls across Graylog, Prometheus, Alertmanager, Linode, GitLab, NetBox and more. The menu of tools Claude has access to is even bigger than that: I have connected 30 backends so far (20 in the public registry, the rest internal to my setup), including most of my ops stack (OPNsense, Tailscale, Xen Orchestra, DokuWiki and more). ToolMesh is what makes that menu composable for Claude. Each backend is a simple DADL file - a small YAML that declares the REST API of the service to ToolMesh, which then exposes those tools to Claude. Most of the publicly available DADLs (currently 20 with 1,833 tools in total) were drafted by an LLM in minutes and tuned from there. The registry is public. Here is the HN API as DADL - the API behind this very page: tools:
get_top_stories:
method: GET
path: /topstories.json
access: read
description: "Up to 500 top story IDs, ordered by HN ranking"
get_item:
method: GET
path: /item/{id}.json
access: read
description: "Get story, comment, job, poll, or pollopt by ID"
params:
id: { type: integer, in: path, required: true }
How can a single agent access so many backends without creating context overflow? Code Mode. Naively, every tool and schema goes into context - 50,000+ tokens before the agent does anything useful. ToolMesh compresses that to ~1,000 by giving the model a typed API surface and letting it ask for endpoint details only when it needs them. That is the difference between "doesn't scale" and "please add 10 more, it's fine!". ToolMesh can also connect to other MCP servers, rendering them code mode capable as well.Security in mind: credentials never reach the model (they are injected at runtime). ToolMesh runs a fail-closed pipeline: auth -> authz -> credential injection -> exec -> output gate -> audit. CallerClass lets the same API have different policy per client type (local dev assistant vs hosted agent vs CI bot). Every call lands in a SQLite-queryable audit log - "what did the agent do Tuesday?" becomes a SQL query, not a shrug. ToolMesh is not magic. APIs with stateful flows or weird auth still need care, and an LLM with a great tool surface can still pick the wrong tool. You still need sane policy. Try before cloning: https://demo.toolmesh.io is a public instance with the HN API loaded (login dadl/toolmesh). Connect Claude Desktop, Claude Code, or ChatGPT in 30 seconds: https://toolmesh.io/demo GitHub: https://github.com/DunkelCloud/ToolMesh
Docs: https://toolmesh.io
DADL Spec + Registry: https://dadl.ai Apache 2.0, single Go binary or Docker, no SaaS dependency. If you think of your full ops stack - what DADLs would you like to have available to your LLM? |
In Code Mode the model sees only two tools by default: list_tools(pattern) and execute_code(code). list_tools takes a regex and returns TypeScript signatures for matching tools. execute_code runs JavaScript that calls them.
So when the model actually needs the GitHub API for example, it calls list_tools("github.*pull") - it gets back just the typed signatures for those endpoints, and then writes code against them. Your second hypothesis is the mechanism: a meta-tool that queries on demand. The typed signatures (first hypothesis) are what the model reasons over once it has them.
That is what really brings the cost down. A large API as MCP tool definitions is easily 40-50k tokens upfront. The same API via list_tools + execute_code is ~1k for the two tool descriptions, plus only the signatures the model pulls per query.