| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by llamataboot 1160 days ago

that being said I think prompt pollution especially for future LLMs in a much gnarlier problem than people think. Even now there is simply no actual solution for prompt injection. You can absolutely determine whether you have unsanitized human input that could be used for SQL injection - there is no way at all to determine that with an LLM.English is simply too non-deterministic and you dont even have to use english - you can use weird encodings and instructions. Even the most trivial jailbreaks like pretending you are a bash prompt can still get you one iteration where it tells you the current date before it tells you it doesn't know it.

(That's a separate issue, if the LLM can tell the current date and there is no safety reason at all for it to hide that it has that capability, training it to lie about whether it can do that IS an actual alignment issue IMHO)

but in my mind that doesn't mean we have reached peak LLM and they will fade out of use, it means that we haven't even seen how they will actually be used yet and it will be in both unintended and intended wacky and harmful ways that are hard to grok.