| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by doe_eyes 659 days ago
	Commercial LLMs generally have input and output filters to prevent "bad" prompts from reaching the model (instead returning canned text), or to nuke output if it appears to violate certain criteria. But then, you have two independent mechanisms that can get out of sync, a classic source of issues in infosec - except both are also more or less inscrutable and fail in unexpected ways.