Hacker News new | ask | show | jobs
by angusturner 1383 days ago
Developing models that can predict if stuff is harmful ironically makes it easier for people to optimize for harm.

e.g. the one line of code in Stable Diffusion that predicts if stuff is NSFW, can be inverted to generate only NSFW stuff.

I tend to agree with OP that there is no technical solution to this problem.

2 comments

With some further refinement real harm could be done. Think of an infinite short video feed that is both irresistible and gradually modifies you.
Isn't this what infinite jest was entirely about
Solution is an arms race, by keeping your algorithmic improvements hidden.
Isn’t that basically what OpenAI and Google tried to do and it lasted all of 3 months.

Problem with tech is once it’s known to be possible if you choose to try and monetize it by making it public as OpenAI and Google were planning to do then it’s only a matter of time before another smart team figure out how you’re doing it.

You can do the Manhattan Project in secret and in 500 years someone else might not realize it’s possible. But the second you do a test of that concept the sign you did that is detectable everywhere and the dots of what you did will connect in someone’s brain somewhere.

Can’t put the genie back in the bottle.