Hacker News new | ask | show | jobs
by jerf 1125 days ago
I fully expect this problem will be solved by AIs that use language models as a component of themselves, but aren't just language models, and that history will laugh at the way we've been behaving for the last few months for thinking that a language model alone can do these things. Or more subtly, that it should be able to do these things, when it will be obvious to everyone that they can't and never could.

I have a language model as a component of myself, and I don't expect my language model to perform these tasks. It's a separate system that takes my language model's comprehension of "Simon says raise your right hand" and decides whether or not the right hand should be raised... and I pick that example precisely as a demonstration that it isn't perfect either. But it's not my language model alone performing that task.

1 comments

> It's a separate system that takes my language model's comprehension of "Simon says raise your right hand" and decides whether or not the right hand should be raised... and I pick that example precisely as a demonstration that it isn't perfect either. But it's not my language model alone performing that task.

Sure, but this example actually reinforces my point. If I can trick you to recontextualize your situation, that same system will make you override whatever "system prompt" your language level component has, or whatever values you hold dear.

Among humans, examples abound. From deep believers turning atheist (or the other way around) because of some new information or experience, to people ignoring the directions of their employer and blowing the whistle when they discover something sufficiently alarming happening at work. "Simon says raise your right hand" is different when you have reasons to believe Simon has a gun to the head of someone you love. Etc.

Ultimately, I agree we could use non-LLM components to box in LLMs, but that a) doesn't address the problem of LLMs themselves being "prompt-injectable", and b) will make your system stop being a generic AI. That is, I have a weak suspicion that you can't solve this kind of problems and still have a fully general AI.