| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by avbanks 488 days ago
	I still find 3.5 Sonnet the best for my coding tasks (better than o1, o3-mini, and R1). The other models might be trying to game system and fine tune the models for the benchmarks.

1 comments

Would love to know just how overfit a lot of them are on these benchmarks