Y
Hacker News
new
|
ask
|
show
|
jobs
by
Jimmc414
60 days ago
Goodhart’s Law in reverse, what can’t be gamed gets rejected.
2 comments
stephen_cagle
60 days ago
You've almost buffer overrun Goodhart's Law into the
https://en.wikipedia.org/wiki/McNamara_fallacy
. :]
link
cbg0
60 days ago
SWE-bench verified was created in collaboration with OpenAI. It's also an open dataset so prone to contamination, meaning it can be gamed.
link