| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by atleastoptimal 332 days ago
	We can't rely on hoping that AI models never see bad ideas or are exposed to harmful content for them to be safe. That's a very flimsy alignment plan and is far more precarious than designing models which understand and are aware of bad content and nevertheless aren't affected in a negative direction.

1 comments

upwardbound2 332 days ago

I think we need both approaches. I don't want to know some things. For example, people who know how good heroin feels can't escape the addiction. The knowledge itself is a hazard.

link

atleastoptimal 332 days ago

Still, any AI model vulnerable to cogitohazards is a huge risk because any model could trivially access the full corpus of human knowledge. It makes more sense to making sure the most powerful models are resistant to cogitohazards rather than developing elaborate schemes to shield their vision and hope that plan works out in perpetuity.

link