| HN Mirror

The three bootstrap tools are a partial answer to (1) — the tool surface never grows, only the registry does, so context pollution is bounded by the search interface rather than the full tool list. Whether the registry search stays useful as it grows is an open question, semantic search over capability definitions is probably the next step.

(2) is where the structured capability format earns its keep over free-text memory. Triggers and suppression conditions give you inspectable, versioned invocation policy rather than prose that degrades over time. Still early though.

(3) I don't have a good answer to yet. Your point about feedback loops is the right framing — knowing whether the agent is actually getting better rather than just accumulating more tools is unsolved. The audit angle (administrators reasoning about which tools fire, when, and whether they should) is where I think this needs to go, but I haven't built that layer.

One thing that might directly address your caching point though — ADRs (Architecture Decision Records). The article that spawned Tendril started with giving an agent a record_decision capability that wrote ADRs to the filesystem. ADRs as agent cache is an interesting framing: structured, persistent, searchable records of why decisions were made at the moment they were made. That's arguably a better cache primitive than summarisation — decisions don't degrade the way summaries do, and they give you something to reason about for regression detection too.

Your tree/hierarchy observation resonates — the registry is a flat index right now which probably doesn't scale past a few dozen capabilities without some grouping structure.