Hacker News new | ask | show | jobs
by trashymctrash 386 days ago
this maybe? https://liveswebench.ai/
3 comments

I don't think you really want to boil this down to a number; there's a whole lot of feature and workflow differences to capture:

* BYO model or not

* CLI, UI, VSC-plugin or web

* async/sync

* MCP support

* context size

* indexed or live grep-style search

There's probably like 10 more.

I don't think it's being kept up to date. I believe for the IDEs, it requires manual testing to get the numbers. Since things change so quickly, it's mostly just a historical artifact. Hopefully some future version is automated.
Never heard SWE-agent until now, and seems to beat Aider (the tool I use) consistently.

Does anyone know if it's GitHub-only or can it be used as a CLI (i.e., Aider replacement)?