Hacker News new | ask | show | jobs
by ziddoap 458 days ago
How much farther can you move the goalposts? We're already almost on another planet.

You ignored almost everything in my original comment and hyper-focused on accuracy. Then, when confronted with the fact that every single example benchmark you provided is a measure of accuracy, you now say "well, it's not a benchmark about a specific person in norway". Obviously not!

The MATH benchmark doesn't ask "what is 2+2", either. Your argument is "well, math-focused models aren't expected to accurately answer 2+2 because it isn't in the MATH benchmark". It's ridiculous.