Hacker News new | ask | show | jobs
VibeBench: Measuring 1k Engineers' Opinions of New Models (vibebench.standardagents.ai)
12 points by jpschroeder 56 days ago
3 comments

"Published benchmarks are gamed, optimized, and overfit, and no longer yield a useful signal."

Is this true?

But I love this concept!

Oh very true. Benchmaxxing itself is basically gaming them.
Love the idea!

Page is incredibly slow on mobile, probably the avatars

800 commits in a year...