| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sjmog 357 days ago
	yesterday I posted a video testing claude-4 sonnet solving an https://simstack.io long-horizon swe challenge unaided (https://news.ycombinator.com/item?id=44424468). for comparison, here's gemini 2.5-pro. I noticed that 2.5-pro is way more cavalier, skipping backups, and trying "more stuff more quickly" than claude's more cautious approach.