Y
Hacker News
new
|
ask
|
show
|
jobs
by
int_19h
558 days ago
Benchmarks are way too easy to game. There's no shortage of models that "beat GPT-4" according to some benchmark or another, that are obviously nowhere even close when you try them on novel tasks.