| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by NateEag 44 days ago

> Perhaps a widely recognized but not overly optimized for benchmark for this class of problems?

I don't see how this could be achieved.

Any widely-recognized benchmark is going to be gamed by the genAI companies.

They have a strong financial incentive to do so, and their products' nature shows that they are not influenced by ethical or societal-good incentives.