|
|
|
|
|
by jmye
64 days ago
|
|
> I'm not sure how groundbreaking the main insight is. I think it likely is groundbreaking for a number of people (especially non-tech CTOs and VPs) who make decisions based on these benchmarks and who have never wondered what the scores are actually scoring. |
|
Whether benchmark results are misleading depends more on the reporting organization than on the benchmark. Integrity and competence play large roles in this. When OpenAI reports a benchmark number, I trust it more than when that same number is reported by a couple Stanford undergrads posting "we achieved SOTA on XYZ benchmark" all over Twitter.