|
|
|
|
|
by AdamConwayIE
127 days ago
|
|
There aren't really any of the typical benchmark suites targeting Codex 5.3 because it's still not in the API. SWE bench for example creates a predictions file and evaluates the results in the harness. Without Codex 5.3 being in the API, it can't. |
|