Hacker News new | ask | show | jobs
by sydon 1067 days ago
How would you go about watermarking AI written text?
7 comments

https://arxiv.org/pdf/2301.10226.pdf

Here's a decent paper on it.

It covers private watermarking (you can't detect it exists without a key), resistance to modifications, etc. Essentially you wouldn't know it was there and you can't make simple modifications to fool it.

OpenAI could already be doing this, and they could be watermarking with your account ID if they wanted to.

The current best countermeasure is likely paraphrasing attacks https://arxiv.org/pdf/2303.11156.pdf

I don't know.

I suppose hosted solutions like ChatGPT could offer an API where you copy some text in, and it searches its history of generated content to see if anything matches.

> bUt aCtuAlLy...

It's not like I don't know the bajillion limitations here. There are many audiences for detection. All of them are XY Problems. And the people asking for this stuff don't participate on Hacker News aka Unpopular Opinions Technology Edition.

There will probably be a lot of "services" that "just" "tell you" if "it" is "written by an AI."

One way is trying to sneak in a specific structure/pattern that is difficult for a human to notice when reading, like using a particular sentence length, paragraph length, or punctuation pattern. Or use certain words in the text that may not be frequently used by humans etc.

Watermarking needs to be subtle enough to be unnoticeable to opposing parties, yet distinctive enough to be detectable.

So, this is an arms race especially because detecting it and altering it based on the watermark is also fun :)

> One way is trying to sneak in a specific structure/pattern that is difficult for a human to notice when reading

This seems like a total non-starter. That can only negatively impact the answers. A solution needs to be totally decoupled from answer quality.

The paper I linked in the parent's comment as the "Simple proof of concept" on page 2, and like you said outlines it's limitations as both negative to performance and also easily detectable and determinable.

Their improved method instead only replaces tokens when there's many good choices available, and skips replacing tokens when there are few good choices. "The quick brown fox jumps over the lazy dog" - "The quick brown" is not replaceable because it would severely harm the quality.

Essentially it's only replacing tokens where it won't harm the performance.

It's worth noting that any watermarking will likely harm the quality to some degree - but it can be minimized to the point of being viable.

You can do this by injecting non visible unicode (LTR / RTL markers, zero width separators, the various "space" analogs, homographs of "normal" characters) but it can obviously be stripped out.
Make half of the tokens (the AI's "dictionary") slightly more likely.

This would not impact output quality much, but it would only work for longish outputs. And the token probability "key" could probsbly be reverse engineered with enough output.

It would be pretty easy to figure out against standard word probability in average datasets. Even then the longer this system runs the more likely it is to pollute its own dataset by people learning to write from gpt itself.
type=text/chatgpt :P
Invisible characters in a specific bit-pattern.

Pretty common steganographic technique, really.

Can you elaborate on "invisible?" The only invisible character I can imagine is a space. It seems like any other character either isn't invisible or doesn't exist (ie, isnt a character).

Additionally, if I copy-paste text like this are the invisible characters preserved? Are there a bunch of extra spaces somewhere?

When students try to evade plagiarism detectors, they will swap characters like replacing spaces with nonbreaking spaces, replacing letters with lookalikes (I vs extended Cyrillic Ӏ etc), and inserting things like the invisible 'Combining Grapheme Joiner'

IMHO it isn't a feasible way of watermarking text though - as someone would promptly come up with a website that undid such substitutions.

> IMHO it isn't a feasible way of watermarking text though - as someone would promptly come up with a website that undid such substitutions.

It doesn't matter since there's no one-pass solution to counterfeiting.

You have the right of it-- the best you can hope for is adding more complexity to the product, which adds steps to their workflow and increases the chances of the counterfeiter overlooking any particular detail that you know to look for.

There's a bunch of different "spaces", one is a "zero-width space" which isn't visible but still gets copied with the text.

https://en.wikipedia.org/wiki/Zero-width_space

And the second site students will go to is zerospaceremover.com or whatever will show up to strip the junk.
So, all I have to do is copy-paste it into a text editor with remove-all-formatting to circumvent that?
If it's generated by a SaaS, the service could sign all output with a public key.
This isn't a watermark though, the idea of a watermark is that it's inherently embedded in the data itself while not drastically changing the data
Why is this comment being downvoted?

OpenAI can internally keep a "hash" or a "signature" of every output it ever generated.

Given a piece of text, they should then be able to trace back to either a specific session (or a set of sessions) through which this text was generated in.

Depending on the hit rate and the hashing methods used, they may be able to indicate the likelihood of a piece of text being generated by AI.

Why would they want to is my question. A single character change would break it.

Then you have database costs of storing all that data forever.

Moreso, it's only for openAI, I don't think it will be too long before other gpt4 level models are around and won't give two shits about catering to the AI identification police.

> A single character change would break it.

That depends on how they hash the data, right? They can use various types of Perceptual Hashing [1] techniques which wouldn't be susceptible to a single-character change.

[1] https://en.wikipedia.org/wiki/Perceptual_hashing

> Then you have database costs of storing all that data forever.

A database of all textual content generated by people? That sounds like a gold mine, not a liability. But as I've mentioned earlier, they don't need to keep the raw data (a perceptual hash is enough).

> won't give two shits about catering to the AI identification police

I'm sure there will be customers willing to pay for access to these checks, even if they're only limited to OpenAI's product (universities and schools - for plagiarism detection, government agencies, intelligence agencies, police, etc).

So other text should not be tagged as AI generated?