|
|
|
|
|
by sjmog
357 days ago
|
|
yesterday I posted a video testing claude-4 sonnet solving an https://simstack.io long-horizon swe challenge unaided (https://news.ycombinator.com/item?id=44424468). for comparison, here's gemini 2.5-pro. I noticed that 2.5-pro is way more cavalier, skipping backups, and trying "more stuff more quickly" than claude's more cautious approach. |
|