Hacker News new | ask | show | jobs
by TOMDM 1171 days ago
I think a lot of these LLM benchmarks should include a human avg, otherwise I don't really have a frame of reference other than personal experience with the models.
1 comments

Human average can be a misleading statistic, because the average human is useless for almost everything. In almost every job, the average person doing the job is well above the average (in the general population) for that particular job.
I think this just demonstrates how the goalposts are shifting though.

Until pretty recently most people would probably say “the average human is very flexible at solving reasoning tasks compared to machines which find reasoning incredibly challenging“.

Now it’s “well of course this AI which wasn’t specifically trained for verbal reasoning can beat an average human at verbal reasoning - humans are useless at almost everything!”

Your goalpost seems to be that GPT needs to be better than experts in their field to be considered “good” at something - but I think it’s just interesting to reflect that that’s the benchmark we are applying now.