| In short - 4 ways to handle MCP sampling risks: 1. Capability gating - Don't declare sampling capability during init for external/untrusted servers.
Keep it enabled only for internal trusted ones. 2. Human approval loops - Force manual review before any sampling request hits your LLM. Protocol says "SHOULD" not "MUST" so implementation varies. 3. Token rate limiting - Set max_tokens params client-side when calling LLM APIs. Again, relies on individual devs following policy. 4. True MCP proxy - Terminate & reestablish connections (not just network filtering). Enables granular controls like "sampling for tool A but not B." The real issue: first 3 strategies depend on individual developers following security policies. Only #4 gives centralized control. Sampling's a double-edged sword - shifts LLM costs from server to client (good for internal workflows) but opens denial-of-wallet attacks from malicious external servers. Most orgs probably don't even know this feature exists yet.
Worth noting the travel booking example is compelling - instead of travel team paying tokens to format JSON responses, the requesting department's LLM budget handles it. Smart cost allocation if you can secure it properly. |