| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by numbers 55 days ago
	I've stopped trusting these "trust me bro" benchmarks and just started going to LM Arena and looking for the actual benchmark comparisons. https://arena.ai/leaderboard/code

2 comments

stri8ted 55 days ago

I doubt this is representative of real world usage. There is a difference between a few turns on a web chatbot, vs many-turn cli usage on a real project.

link

nba456_ 55 days ago

This is not any better of a benchmark

link