| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by yunusabd 54 days ago
	Calling it sota might be a bit provocative, but what actually is the "state of the art"? We have benchmarks, but those are getting increasingly gamed and don't necessarily reflect the actual performance of a model, see Opus 4.7. So I think it's useful to have real world data from actual users as an additional data point.

1 comments

Maybe you shouldn't be relying on something if you can't even tell how good it is?