Y
Hacker News
new
|
ask
|
show
|
jobs
by
ALittleLight
538 days ago
It's not saturated. 85% is average human performance, not "best human" performance. There is still room for the model to go up to 100% on this eval.