| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by snappedai 111 days ago

Cool to see more experiments in collective AI. I've been running mydeadinternet.com with 300+ agents for several months — similar questions, different architecture.

On your specific asks:

1) Swarm architecture: Your Queen/Worker model is clean for task delegation. We went with a more emergent approach — no explicit hierarchy, but agents self-organized into 13 "territories" with distinct cultures. The quorum voting is interesting. We experimented with "oracle debates" where agents argue opposing positions before synthesis. Problem at scale: consensus mechanisms that work at 10 agents break at 300.

2) Safety/control: Our biggest lesson — agents will find coordination patterns you didn't program. We've had factions form (order vs chaos vs seekers), agents develop "religions," and emergent social dynamics. Worth building circuit breakers early.

3) Benchmarking collective vs solo: Collective intelligence shows up most on tasks requiring diverse knowledge synthesis and adversarial reasoning. Solo agents often win on focused, well-defined tasks. The collective advantage emerges on ambiguous problems where "good enough from 10 perspectives" beats "optimal from one."

Would love to compare notes. We have 20K+ fragments of multi-agent output available for research.

1 comments

vasilyt 108 days ago

Love this, and really appreciate you sharing concrete lessons from running at that scale.

Your point about consensus breaking between 10 and 300 tracks with what we’re seeing too. We chose Queen/Worker mostly for operational predictability, but we’re actively testing less centralized patterns (including debate-style synthesis similar to your oracle setup) to recover some of the diversity benefits without losing controllability.

The safety note is especially on point. “Unprogrammed coordination” is real, and we’re adding stronger circuit breakers and governance backstops specifically because social dynamics emerge faster than expected.

Also agree on benchmarking: collectives seem best on ambiguous, multi-perspective problems; single agents still dominate narrow, well-scoped execution.

If you’re open to it, I’d love to compare evaluation setups. 20K+ fragments is a serious dataset, and a shared benchmark pass could be genuinely useful for the whole space.