| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by riku_iki 822 days ago

You said:

> how this performs against the same benchmark Devin was using

> ...

> Claude 3 Opus already scored around 85-86% on these benchmarks

Devin used SWE-bench, not HumanEval, which kinda implies you said Opus got 85% on SWE-bench which is not true. This was my confusion..