Hacker News new | ask | show | jobs
by rurban 108 days ago
Doing tickets and commenting cost and quality in the PR.

Still, the best are outstanding, and the medium ones bare usable. I rank it by IQ. From 140 to utterly stupid. opencode/gpt-oss-120b local got a 90. opencode/opus-4.6 gets 140. codex/gpt-5.4 gets 115. All for C/C++ tasks.

There was one expensive Chinese SWE benchmark posted recently to arxiv. It did confirm my evaluation.