Hacker News new | ask | show | jobs
by avereveard 374 days ago
There's a new set of metrics that capture advances better than MMLU or it's pro version but nothing yet as standardized and specifically very few have a hidden set of tests to keep advancements from been from directional fine tuning.