Hacker News new | ask | show | jobs
by ALittleLight 538 days ago
It's not saturated. 85% is average human performance, not "best human" performance. There is still room for the model to go up to 100% on this eval.