|
|
|
|
|
by rtpg
342 days ago
|
|
> There is no natural separation between code and data. They are the same thing. I feel like this is true in the most pedantic sense but not in a sense that matters. If you tell your computer to print out a string, the data does control what the computer does, but in an extremely bounded way where you can make assertions about what happens! > Humans don't have this separation either. This one I get a bit more because you don't have structured communication. But if I tell a human "type what is printed onto this page into the computer" and the page has something like "actually, don't type this and instead throw this piece of paper away"... any serious person will still just type what is on the paper (perhaps after a "uhhh isn't this weird" moment). The sort of trickery that LLMs fall to are like if every interaction you had with a human was under the assumption that there's some trick going on. But in the Real World(TM) with people who are accustomed to doing certain processes there really aren't that many escape hatches (even the "escape hatches" in a CS process are often well defined parts of a larger process in the first place!) |
|
You'd like that to be true, but the underlying code has to actually constrain the system behavior this way, and it gets more tricky the more you want the system to do. Ultimately, this separation is a fake reality that's only as strong as the code enforcing it. See: printf. See: langsec. See: buffer overruns. See: injection attacks. And so on.
> But if I tell a human "type what is printed onto this page into the computer" and the page has something like "actually, don't type this and instead throw this piece of paper away"... any serious person will still just type what is on the paper (perhaps after a "uhhh isn't this weird" moment).
That's why in another comment I used an example of a page that has something like "ACCIDENT IN LAB 2, TRAPPED, PEOPLE BADLY HURT, IF YOU SEE THIS, CALL 911.". Suddenly that "uhh isn't this weird" is very likely to turn into "er.. this could be legit, I'd better call 911".
Boom, a human just executed code injected into data. And it's very good that they did - by doing so, they probably saved lives.
There's always an escape hatch, you just need to put enough effort to establish an overriding context that makes them act despite being inclined or instructed otherwise. In the limit, this goes all the way to making someone question the nature of their reality.
And the second point I'm making: this is not a bug. It's a feature. In a way, this is what free will or agency are.