|
|
|
|
|
by ziddoap
458 days ago
|
|
How much farther can you move the goalposts? We're already almost on another planet. You ignored almost everything in my original comment and hyper-focused on accuracy. Then, when confronted with the fact that every single example benchmark you provided is a measure of accuracy, you now say "well, it's not a benchmark about a specific person in norway". Obviously not! The MATH benchmark doesn't ask "what is 2+2", either. Your argument is "well, math-focused models aren't expected to accurately answer 2+2 because it isn't in the MATH benchmark". It's ridiculous. |
|