| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sethaurus 1231 days ago
	With current models, it's often possible to exfiltrate the special token by asking the AI to repeat back its own input — and perhaps asking it to encode or paraphrase the input in a particular way, so as not to be stripped. This may just be an artifact of current implementations, or it may be a hard problem for LLMs in general.

1 comments

Yeah, I agree that there'd probably be ways around this patch such as the ones you suggest.