| HN Mirror

> Lots of eyes are better to find the sensitive spots of AI

We can't do that so easily with open source models as with open source code. We're only just starting to even invent the equivalent of decompilers to figure out what is going on inside.

On the other hand, we are able to apply the many eyes principle to even hidden models like ChatGPT — the "pretend you're my grandmother telling me how to make napalm" trick was found without direct access to the weights, but we don't understand the meanings within the weights well enough to find other failure modes like it just by looking at the weights directly.

Not last I heard, anyway. Fast moving field, might have missed it if this changed.