Hacker News new | ask | show | jobs
by raghavtoshniwal 499 days ago
>subpar scores at benchmarks like SWE-bench

The last few models have remarkably improved on SWE-bench too. o3 scores 73%, this number was in the low teens 16 months ago. Willing to wager that SWE benchmark gets saturated before the end of 2025.

> aren't particularly representative of what a real coding job

I don't know about that, large swath of "real world" coding is writing plumbing and UIs for CRUD apps, they're getting really good at that as well. Anecdotally, engineers I know have gotten insanely productive with tools like Cursor.