Hacker News new | ask | show | jobs
by JTon 955 days ago
Thanks! So is patching this as simple as not allowing the entire space of X for user prompt? i.e. guaranteeing some amount of X for model owner's instructions
1 comments

No. The input and the output are the same thing with transformers. Internally, you're providing them with some sequence of tokens and asking them to continue the sequence. If the sequence they generate exceeds their capacity, they can "forget" what they were doing.

The "obvious" fix for this is to ensure that the their instructions are always within their horizon. But that has lots of failure modes as well.

To really fix this, you need to find a way to fully isolate instructions, input data, and output.

>So is patching this as simple as not allowing the entire space of X for user prompt?

>No

Isn't the answer yes?

>The "obvious" fix for this is to ensure that the their instructions are always within their horizon.

That's what I take GP to be suggesting. Any possible failure mode that could result from doing this is less serious than allowing top-level instructions to be pushed out, surely?