Hacker News new | ask | show | jobs
by visarga 886 days ago
That's a bad call. We would stop openly looking for AI vulnerabilities and create conditions for secret development that would hide away the dangers without being safer. Lots of eyes are better to find the sensitive spots of AI. We need people to hack weaker AIs and help fix them or at least understand the threat profile before they get too strong.
1 comments

> Lots of eyes are better to find the sensitive spots of AI

We can't do that so easily with open source models as with open source code. We're only just starting to even invent the equivalent of decompilers to figure out what is going on inside.

On the other hand, we are able to apply the many eyes principle to even hidden models like ChatGPT — the "pretend you're my grandmother telling me how to make napalm" trick was found without direct access to the weights, but we don't understand the meanings within the weights well enough to find other failure modes like it just by looking at the weights directly.

Not last I heard, anyway. Fast moving field, might have missed it if this changed.