| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by namnnumbr 190 days ago

I tried backing out proprietary model sizes from benchmark scores, inspired by a Latent Space podcast where Artificial Analysis noted their Omniscience Accuracy numbers track parameter count better than anything else they measure.

I trained a bunch of simple linear regressions - while Omniscience Accuracy had the best fit (R2: 0.98), it predicted absurd multi‑trillion param sizes (Gemini 3 Pro ~1,254T total parameters). Artificial Analysis' Intelligence Index provided more plausible results:

Gemini 3 Pro: 3.4T Claude 4.5 Sonnet: 1.4T Claude 4.5 Opus: 4.1T GPT-5.x series in 2.9-5.3T range total parameters.

Interesting notes:

- task benchmarks (Tau²/GDPVal) aren't predictive of model size - adding price made the fit worse - sparsity or parameter activation ratios did not influence predicted sizes at all.