| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by matus-pikuliak 297 days ago
	That is absolutely not a reliable defense. Attackers can break these defenses. Some attacks are semantically meaningless, but they can nudge the model to produce harmful outputs. I wrote a blog about this: https://opensamizdat.com/posts/compromised_llms