| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Jgoauh 264 days ago
	have you tried https://artificialanalysis.ai/

2 comments

JimDugan 264 days ago

Dumb collation of benchmarks that the big labs are essentially training on. Livebench.ai is the industry standard - non contaminated, new questions every few months.

link

IgorPartola 264 days ago

Thanks! Are the scores in some way linear here? As in, if model A is rated at 25 and model B at 50, does that mean I will have half the mistakes with model B? Get answers that are 2x more accurate? Or is it subjective?

link

esafak 264 days ago

I believe the score represents the fraction of correct answers, so yes.

link

alexeiz 263 days ago

It says the best "coding index" is held by Grok 4 and Gemini 2.5 Pro. Give me a break. Nobody uses those models for serious coding. It's dominated by Sonnet 4/Opus 4.1 and GPT-5.

link