Hacker News new | ask | show | jobs
by vincentvandeth 119 days ago
This approach sounds clean in theory, but in production you're building a black box. When your planning agent hands off to an implementation agent and that hands off to a review agent — where did the bug originate? Which agent's context was polluted? Good luck tracing that. I went the opposite direction: single agent per task, strict quality gates between steps, full execution logs. No sub-agents. Every decision is traceable to one context window. The governance layer (PR gates, staged rollouts, acceptance criteria) does the work that people expect sub-agents to do — but with actual observability.

After 6 months in production and 1100+ learned patterns: fewer moving parts, better debugging, more reliable output. Built a full production crawler this way — 26 extractors, 405 tests — without sub-agents. Orchestrator acts as gatekeeper that redispatches uncompleted work.

1 comments

> Every decision is traceable to one context window

There are no models that can do all the mentioned steps in a single usable context window. This is why subagents or multi-agent orchestrators exist in the first place.

You're right that no model handles everything in one context window — that's exactly why I built context rotation. Each task runs in a single agent context (one responsibility, clear scope), and when the window fills up, the system automatically rotates: writes a structured handover, clears, and resumes in a fresh window.

The key distinction: sub-agents run within a parent context with shared state (black box). My approach uses independent parallel agents (separate terminals, separate context windows) that report back to an orchestrator. Large tasks get split into smaller dispatches upfront — each scoped to fit a single context window. The orchestrator can dispatch research to 3 agents in parallel, collect their outputs, then dispatch a synthesis task to a single agent that merges the findings.

So it's not "one context window for everything" — it's right-sized tasks with full observability per agent, and a governance layer managing the sequence and merging results.

That sounds interesting. I do hate how there's no observability into subagents and you just get a summary.

How do they report back to the orchestrator? Tmux?

Yes, tmux. The setup is a 2x2 grid:

T0 (orchestrator) | T1 (Track A) T2 (Track B) | T3 (Track C)

When a worker finishes, it writes a structured report to a shared unified_reports/ directory. A file watcher (receipt processor) detects it, parses the report into a structured NDJSON receipt (status, files changed, open items, git ref), and delivers it to T0's pane.

T0 then reviews the receipt, runs a quality advisory (automated pass/warn/hold verdict), and decides: close open items, complete the PR, or redispatch. Everything is filesystem-based — no API, no database, no shared memory between agents. Each terminal has its own context window, its own Claude Code (or Codex/Gemini) session, and the only communication channel is structured files on disk.

The receipt ledger is append-only NDJSON, so you can always trace: which agent did what, when, on which dispatch, with which git commit.

I open-sourced the setup recently if you want to dig into the details.