Hacker News new | ask | show | jobs
by gwern 1203 days ago
Which of course reflects how language and real-world text data is! There is no such separation. It is, in fact, profoundly difficult to separate 'instruction' and 'data', and every single injection attack (as well as all the related classes of attacks) exploits this fact. It's not some weird little language model glitch, it's a profound fact that we have spent generations engineering layer after layer of software trying to hide from ourselves. So, it may be quite difficult to resolve in full generality. (As opposed to Bing's attitude which is the old 1990s MS attitude of just patch the instances that anyone complains about.)