|
|
|
|
|
by rahmeero
1118 days ago
|
|
I can see how steganography applied to images can result in hard-to-detect watermarks or provenance identifiers. But I don't see how these can be effectively used in text content. Yes, an AI program can encode provenance identifiers by length of words, starting letters of sentences, use of specific suffixes, and other linguistic constructs. However, say that I am a student with an AI-generated essay and want to make sure my essay passes the professor's plagiarism checker. Isn't it pretty easy to re-order clauses, substitute synonyms, and add new content? In fact, I think there is even a Chrome extension that does something like that. Or maybe that is too much work for the lazy student who wants to completely rely on ChatGTP or doesn't know any better. |
|
I'm confused why you focus on plagiarism detection. That being said, your scenario is very briefly mentioned in the conclusion and requires augmenting the approach (entropy coding) with error correction.
The result would be that as long as your modifications (reordering clauses, etc.) reasonably closely follow a known distribution with limited entropy (which I think it clearly does, although specifying this distribution and dealing with the induced noisy channel might be very hard), there will be a way to do covert communication despite it, though probably only a very small amount of information can be transmitted reliably. For plagiarism detection, you only need a number of bits that scales like -log[your desired false positive rate] so it would seem theoretically possible. Theoretically it also doesn't matter if you use text or images, though in practice increasing the amount of transmitted data should make the task a lot easier. However, I'm not sure if something like this can be practically implemented using existing methods.