Hacker News new | ask | show | jobs
by easygenes 28 days ago
Their methods are only calibrated on open models (of course) and they admit very broad confidence bounds. You can also just see from comparing their estimates of the same models at different reasoning levels that there are major confounders to this. I would err on the absolute lowest side of their estimates for frontier models (e.g. 3T for GPT-5.5, 1.5-2T for Opus 4.5+).