Hacker News new | ask | show | jobs
by pants2 181 days ago
Benchmarks are moving closer to reality though with things like FrontierScience and SWE-Bench Pro
1 comments

Maybe you are right, but maybe it’s radiology all over again.