| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rahmeero 1165 days ago

I can see how steganography applied to images can result in hard-to-detect watermarks or provenance identifiers.

But I don't see how these can be effectively used in text content. Yes, an AI program can encode provenance identifiers by length of words, starting letters of sentences, use of specific suffixes, and other linguistic constructs.

However, say that I am a student with an AI-generated essay and want to make sure my essay passes the professor's plagiarism checker. Isn't it pretty easy to re-order clauses, substitute synonyms, and add new content? In fact, I think there is even a Chrome extension that does something like that.

Or maybe that is too much work for the lazy student who wants to completely rely on ChatGTP or doesn't know any better.

3 comments

staunton 1165 days ago

The point of steganography (as discussed in the paper) is not unerasable watermarks but undetectable (to the adversary) messages in innocent-looking communication.

I'm confused why you focus on plagiarism detection. That being said, your scenario is very briefly mentioned in the conclusion and requires augmenting the approach (entropy coding) with error correction.

The result would be that as long as your modifications (reordering clauses, etc.) reasonably closely follow a known distribution with limited entropy (which I think it clearly does, although specifying this distribution and dealing with the induced noisy channel might be very hard), there will be a way to do covert communication despite it, though probably only a very small amount of information can be transmitted reliably. For plagiarism detection, you only need a number of bits that scales like -log[your desired false positive rate] so it would seem theoretically possible. Theoretically it also doesn't matter if you use text or images, though in practice increasing the amount of transmitted data should make the task a lot easier. However, I'm not sure if something like this can be practically implemented using existing methods.

link

pmoriarty 1165 days ago

Instead of manually reordering clauses, etc, you could just run the original essay through another LLM without watermarking capability and ask it to write a new essay based on the original.

Then test the result against your own plagiarism detector and iterate through the watermark-less LLM until the resulting essay passes.

Or just proactively run it through a bunch of times.

Or just use the watermark-less LLM to begin with.. personal, unshackled, powerful LLMs are definitely on the trajectory we're headed in.

link

nullc 1165 days ago

This is fairly theoretical work. It assumes that the parties (and the adversary) know the distribution precisely.

Its direct practical may be potentially somewhat limited because people aren't going around communicating randomly selected LLM outputs... and if you use LLM output in a context where text would be expected it could be distinguished.

It's not useful for watermarking as the first change will destroy all the rest of the embedding.

I can make a contrived example where it's directly useful: Imagine you have agents in the field, you could send out LLM generated spam to communicate with them. Everyone expects the spam to be LLM generated, so it's not revealing that its detectable as such. This work discusses how you can make the spam carry secret messages to the agents in a way that is impossible to detect (without the key, of course) even by an attacker that has the exact spam LLM.

Less contrived, a sufficiently short message from a sufficiently advanced LLM is probably indistinguishable from a real post in practice. -- but that's outside of the scope of the paper. It's hard (impossible?) to rigorously analyze that security model because we can't say much about the distribution of "real text" so we can't say how far from it LLM output is. The best models we have of the distribution of real text are these LLMs, if you take them to BE the distribution of real text then that approach is perfectly secure by definition.

But really, even if the situation it solves is too contrived to be a gain over alternatives, the context provides an opportunity to explore the boundary of what is possible.

link

sacnoradhq 1165 days ago

A higher level of abstraction generative AI will soon be able to apply is in the design of algorithms that humans cannot possibly comprehend that go well beyond rudimentary newspaper want ad spycraft.

Also, image based steg:

https://www.provos.org/p/outguess/

https://www.provos.org/p/detection-with-stegdetect/

link