Hacker News new | ask | show | jobs
Lies, Damn Lies and Database Benchmarks (questdb.com)
14 points by eigenBasis 2 days ago
2 comments

Reminds me of the recent Terminal Bench controversy [1][2][3]

If theres a benchmark, people will cheat, lie and optimize for that benchmark. Honest depends on the compliance enforced on teams. But if, compliance itself is weak, it is going to be taken advantage of. Like growing up india, you would optimize for the exam and not what you learn from it.

[1] https://news.ycombinator.com/item?id=47920787

[2] https://www.tbench.ai/news/leaderboard-integrity-update

[3] https://debugml.github.io/cheating-agents/

Same with LLM benchmarks these days.
Well, the pelican benchmark is easily verifiable.