| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by theptip 980 days ago
	It’s “alignment” in the broad sense of aligning to the goals of the org that deploys the AI system. The downstream effects are different than social engineering, even if the methods overlap (they are not the same though). The observation being there are no underlying “human values” like “don’t kill” to fall back on; if you pop a prompt hack you can have the AI take on any personality including murderous psychopath. Right now all that amounts to is amusing angry messages but hopefully it’s easy to see why that would cause alignment-as-safety issues when LLMs are embodied, for example.