| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by godelski 458 days ago

Many of these things are ones that people have been screaming about for years (including Sarah Hooker). It's great to see some numbers attached. And in classic Cohere manner, they are not holding punches on some specific people. Expect them to push back.

There's a crux that makes it easy to understand why we should expect it. If you code (I assume you do) you probably (hopefully) know that you can't test your way into proving your code is correct. Test Driven Development (TDD) is a flawed paradigm. You should use tests, but they are hints. That's why Cohere is quoting Goodhart at the top of the intro[0]. There is NO metric where the metric is perfectly aligned with the reason you implemented that metric in the first place (intent). This is fucking alignment 101 here. Which is why it is really ironic how prolific this attitude is in ML[1]. I'm not sure I believe any person or company that claims they can make safe AI if they are trying to shove benchmarks at you.

Pay close attention, evaluation is very hard. It is also getting harder. Remember reward hacking, it is still alive and well (it is Goodhart's Law). You have to think about what criteria meets your objective. This is true for any job! But think about RLHF and similar strategies. What methods also maximize the reward function? If it is human preference, deception maximizes just as well (or better) than accuracy. This is bad design pattern. You want to make errors as loud as possible, but this paradigm makes errors as quiet as possible and you cannot confuse that with lack of errors. It makes evaluation incredibly difficult.

Metrics are guides, not targets

[0] Users that recognize me may remember me for mentioning 'Goodhart's Hell', the adoption of Goodhart's Law as a feature instead of a bug. It is prolific, and problematic.

[1] We used to say that when people say "AI" instead of "ML" to put your guard up. But a very useful one that's been true for years is "if people try to prove by benchmarks alone, they're selling snakeoil." There should always be analysis in addition to metrics.