|
|
|
|
|
by raghavtoshniwal
499 days ago
|
|
>subpar scores at benchmarks like SWE-bench The last few models have remarkably improved on SWE-bench too. o3 scores 73%, this number was in the low teens 16 months ago. Willing to wager that SWE benchmark gets saturated before the end of 2025. > aren't particularly representative of what a real coding job I don't know about that, large swath of "real world" coding is writing plumbing and UIs for CRUD apps, they're getting really good at that as well. Anecdotally, engineers I know have gotten insanely productive with tools like Cursor. |
|