Hacker News new | ask | show | jobs
by NateEag 44 days ago
> Perhaps a widely recognized but not overly optimized for benchmark for this class of problems?

I don't see how this could be achieved.

Any widely-recognized benchmark is going to be gamed by the genAI companies.

They have a strong financial incentive to do so, and their products' nature shows that they are not influenced by ethical or societal-good incentives.