Hacker News new | ask | show | jobs
by discobot 556 days ago
the problem is that last generation of the largest models failed to overcome smaller models on the benchmarks, see lack of new claude opus or gpt-5. The problem is probably in the benchmarks, but anyway.