|
|
|
|
|
by rurban
108 days ago
|
|
Doing tickets and commenting cost and quality in the PR. Still, the best are outstanding, and the medium ones bare usable. I rank it by IQ. From 140 to utterly stupid. opencode/gpt-oss-120b local got a 90. opencode/opus-4.6 gets 140. codex/gpt-5.4 gets 115. All for C/C++ tasks. There was one expensive Chinese SWE benchmark posted recently to arxiv. It did confirm my evaluation. |
|