|
|
|
|
|
by namnnumbr
143 days ago
|
|
I tried backing out proprietary model sizes from benchmark scores, inspired by a Latent Space podcast where Artificial Analysis noted their Omniscience Accuracy numbers track parameter count better than anything else they measure. I trained a bunch of simple linear regressions - while Omniscience Accuracy had the best fit (R2: 0.98), it predicted absurd multi‑trillion param sizes (Gemini 3 Pro ~1,254T total parameters). Artificial Analysis' Intelligence Index provided more plausible results: Gemini 3 Pro: 3.4T
Claude 4.5 Sonnet: 1.4T
Claude 4.5 Opus: 4.1T
GPT-5.x series in 2.9-5.3T range total parameters. Interesting notes: - task benchmarks (Tau²/GDPVal) aren't predictive of model size
- adding price made the fit worse
- sparsity or parameter activation ratios did not influence predicted sizes at all. |
|