|
|
|
|
|
by riku_iki
822 days ago
|
|
You said: > how this performs against the same benchmark Devin was using > ... > Claude 3 Opus already scored around 85-86% on these benchmarks Devin used SWE-bench, not HumanEval, which kinda implies you said Opus got 85% on SWE-bench which is not true. This was my confusion.. |
|