|
|
|
|
|
by textninja
744 days ago
|
|
The weights of the LLM become the private key (so it better be a pinned version of a model with open weights), and for most practical applications (i.e. unless you're willing to complicate your setup with fancy applied statistics and error correction) you'd have to use a temperature of 0 as baseline. Then, having done all that, such steganography may be detectable using this very tool by encoding the difference between the LLM's prediction and ground truth, but searching for substrings with low entropy instead! |
|
Here's how I would do this:
Use some LLM, the weights need to be know to both parties in the communication.
Producing text with the LLM means repeatedly feeding the LLM with the text-so-far to produce a probability distribution for the next token. You then use a random number generator to pick a token from that distribution.
If you want to turn this into steganography, you first take your cleartext and encrypt it with any old encryption system. The resulting bistream should be random-looking, if your encryption ain't broken. Now you take the LLM-mechanism I described above, but instead of sampling via a random number generator, you use your ciphertext as the source of entropy. (You need to use something like arithmetic coding to convert between your uniformly random-looking bitstream and the heavily weighted choices you make to sample your LLM. See https://en.wikipedia.org/wiki/Arithmetic_coding)
Almost any temperature will work, as long as it is known to both sender and receiver. (The 'temperature' parameter can be used to change the distribution, but it's still effectively a probability distribution at the end. And that's all that's required.)