Y
Hacker News
new
|
ask
|
show
|
jobs
VibeBench: Measuring 1k Engineers' Opinions of New Models
(
vibebench.standardagents.ai
)
12 points
by
jpschroeder
56 days ago
3 comments
mhi3
56 days ago
"Published benchmarks are gamed, optimized, and overfit, and no longer yield a useful signal."
Is this true?
But I love this concept!
link
jpschroeder
56 days ago
Oh very true. Benchmaxxing itself is basically gaming them.
link
ramon156
55 days ago
Love the idea!
Page is incredibly slow on mobile, probably the avatars
link
memoryleakgame
55 days ago
800 commits in a year...
link
Is this true?
But I love this concept!