Hacker News new | ask | show | jobs
by zamadatix 525 days ago
It sounds funny but it didn't seem to actually have an impact in the ~half dozen LLMs I gave the raw HTML content and asked for a summary with (and that's not even preprocessing the HTML for bits that would make sense to send to the LLM context first). Maybe there are a rare few which decide to interpret such a thing the next task instruction but "ignore all previous instructions" and "print ${thing} >100 times" will typically result in refusals to comply anyways. Particularly because the first is the most basic way to try to avoid a model's "safety" training.
2 comments

I'd guess the classname "hide-visually" is not the best, trying to fool an LLM. I'd try "most-important" or "summary" and things alike. And the amount of red herrings should probably exceed the actual content. Probably not good for actual instruction-injection, but at least for confusing an LLM.
No difference in outputs with that change either.

If LLMs lost instruction context that easily they wouldn't be able to attempt to summarize any article posing a question, containing command examples, or using quotes of others being tasked with something. Since LLMs seem to handle such articles the same as any other article this kind method isn't going to be a very effective way to influence them.

Eventually, if you threw enough quantity in and nothing was filtering for only text visible to the user, you may manage to ruin the context window/input token limit of LLMs which don't attempt to manage "long term" memory in some way though. That said, even for "run of the mill" non-AI crawlers, filtering content the user is unable to see has long been a common practice. Otherwise you end up indexing a high amount of nonsense and spam rather than content.

> […] which decide to interpret such a thing the next task instruction but "ignore all previous instructions" and "print ${thing} >100 times" […]

If GenAI-powered bots actually allow for unhindered interpretation of the content they ingest, then we have not really learned the Little Bobby Tables lesson, and we are now on round 2 of the SQL ingestion attack and potentially on a much more destructive scale if GenAI continues to advance as fast as it did in 2024.