Hacker News new | ask | show | jobs
by __MatrixMan__ 531 days ago
Seems like an opportunity to do some steganography. If the model isn't known by the attacker (or perhaps they can be "salted" to become unknown) then the actual message can be encoded in the offsets.

This would be much nicer than text-in-image steganography because services often alter images before displaying them, but they rarely do that to text (assuming the usual charset and no consecutive whitespace).

2 comments

There's already research into stenography for LLM generated text for fingerprinting and identifying source: https://www.nature.com/articles/s41586-024-08025-4

The idea seems similar enough that I wanted to share. The same way you can hide information in the text to prove it was generated by a specific model and version, of course you can use this for secrets as well.

tbh I'm not sure this would qualify as steganography - the message doesn't exist at all in the encoded form. It's not hidden, it's completely gone, the information is now split into two pieces.

So it's cryptography. With a shared dictionary. Basically just ECB, though with an unbelievably large and complicated code book.

Realistically, 1) the model will be one of a small number of available models which can be tested against, 2) LLMs converge on similar predictions, so even an "unknown" model can hardly be considered cryptographic. The advantage of using an LLM this way is the ability to hide an information stream in an innocuous looking text, in the form of subtle deviations from the most likely next word. Hence, steganography.

Incidentally, this technique is actually an old one that has already seen use in magic shows and confidence tricks; two people who wish to communicate covertly, such as a magician and an "audience member", can exchange small amounts of information by varying the wording of simple responses: "yes", "that's right", "quite so". This can be used to, for instance, discreetly reveal the value of a card that the magician is "guessing" through "ESP".

I may be abusing some definition or another, but I'd say that if the primary design goal is that your cyphertext can masquerade as cleartext, "steganography" scratches the itch pretty well, if not precisely.
48298346,1,3,2,3,1,2,3 doesn't really masquerade as cleartext.

you could hide that as text in other text, and that'd be steganography.

Sorry I wasn't very complete with my description. I mean that 0,0,0,0... would correspond with the "most probable" continuation of some prompt and it would map to sensical english. And then 48298346,1,3,2... would correspond with a less probable continuation of the prompt, but it would also map to sensical english. But where more vs less probable, and the associated probabilities, are only knowable by someone with access to the secret LLM.

So you'd feed the algorithm some starter text like: "Here's my favorite recipe for brownies", and then you'd give it some data to encode, and depending on which data you gave it, you'd get a different, but "plausible", recipe for brownies. The recipient could reverse the recpie back into numbers, and from that they'd decode the hidden message.

The trick would be balancing the LLM's attempt to make sense against whatever additional constraints came along with your data encoding scheme. If you tried to encode too much cyphertext into a too-short brownies recipe, the recipe would fail to be convincingly a recipe. Conveniently, it's conventional to prefix recipes with a tremendous amount of text that nobody reads, so you've got a lot of entropy to play in.

oooh, yeah, I was thinking about it backwards. sorry about that! yeah I'd agree that's steganography.

I would definitely expect something like this to happen at some point. as long as people use LLMs with a non-zero temperature, you'd expect variation from anyone's output, so it'd be super hard to detect / super deniable with just the output.