Hacker News new | ask | show | jobs
by ZoomerCretin 695 days ago
Could you talk more about HuggingFace's new benchmark for LLMs? When did it become obvious that the old benchmarks were no longer sufficient:
1 comments

[author here] we interviewed the maintainer of that leaderboard if you want to hear from her directly! https://www.latent.space/p/benchmarks-201

tldr: old benchmarks saturated, methodology was liable to a lot of subtle biases. as she mentions on the pod, they're already working on leaderboard v3.