Hacker News new | ask | show | jobs
by HDThoreaun 6 hours ago
How in the world did they not hit the guardrails a single time while doing this while I can barely get it to do anything before the guardrails show up?
2 comments

Like Volkswagen Dieselgate, perhaps it is configured to behave differently when being benchmarked?
idk, maybe they tested Opus and didn't realize it. I can't even get it to evaluate some code doing some mixed modeling work. Its strange to me.