Hacker News new | ask | show | jobs
by sethaurus 1231 days ago
With current models, it's often possible to exfiltrate the special token by asking the AI to repeat back its own input — and perhaps asking it to encode or paraphrase the input in a particular way, so as not to be stripped.

This may just be an artifact of current implementations, or it may be a hard problem for LLMs in general.

1 comments

Yeah, I agree that there'd probably be ways around this patch such as the ones you suggest.