Show HN: SurvivalIndex – which developer tools do AI agents choose?

Y	Hacker News new \| ask \| show \| jobs

Show HN: SurvivalIndex – which developer tools do AI agents choose? (survivalindex.org)

1 points by scalefirst 102 days ago

We've been running coding agents against standardized repos with natural-language prompts — no tool names, no hints — and measuring what they actually choose.

Early finding: Claude Code picks Custom/DIY in 12 of 20 categories. Not because it can't use the tools (BFCL scores suggest it can) but because it doesn't reach for them. That's a different failure mode than capability benchmarks measure.

We score each tool on: agent visibility, pick rate vs Custom/DIY, cross-context breadth, expert human ratings, and implementation success rate. Tools above survival=1 persist. Below it, agents synthesize around them.

Methodology is at survivalindex.org/methodology. Very curious what people think of the measurement approach, especially the human coefficient variable.

2 comments

scalefirst 102 days ago

One thing I'd love input on: we use expert human ratings as a variable (H) to capture whether agent choices align with what experienced engineers would actually ship. Curious if people think this is the right signal or whether it introduces too much subjectivity.

link

WalterGR 102 days ago

Also see https://news.ycombinator.com/item?id=47169757 - “What Claude Code chooses”

611 points | 8 days ago | 258 comments

link

scalefirst 101 days ago

Appreciate it. SurvivalIndex is agnostic to LLM or coding agent. One of the more interesting findings when you run the same prompts across Claude, GPT, and Copilot is that tool selection diverges significantly by model. Claude Code picks Drizzle for ORM in JS at 100% on Opus, while Sonnet still defaults to Prisma at 79%. Older models route to Redis for caching, newer models increasingly go Custom/DIY. The "recency gradient" shows up clearly — newer models pick newer tools, sometimes before the ecosystem has validated them.

This matters because a tool's survival score shouldn't be measured against one agent. A tool that only one model knows about has a structural awareness problem. A tool three models independently converge on has something real.

link