Y
Hacker News
new
|
ask
|
show
|
jobs
Measuring What Matters: Construct Validity in Large Language Model Benchmarks
(
oxrml.com
)
3 points
by
Cynddl
230 days ago
2 comments
ammaox
230 days ago
A very large review of AI benchmarks that reveals a worrying trend in their effectiveness and scientific rigor
link
jruohonen
227 days ago
Also Register picked it:
https://www.theregister.com/2025/11/07/measuring_ai_models_h...
link