| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sydon 1067 days ago
	How would you go about watermarking AI written text?

7 comments

thewataccount 1067 days ago

https://arxiv.org/pdf/2301.10226.pdf

Here's a decent paper on it.

It covers private watermarking (you can't detect it exists without a key), resistance to modifications, etc. Essentially you wouldn't know it was there and you can't make simple modifications to fool it.

OpenAI could already be doing this, and they could be watermarking with your account ID if they wanted to.

The current best countermeasure is likely paraphrasing attacks https://arxiv.org/pdf/2303.11156.pdf

link

doctorpangloss 1067 days ago

I don't know.

I suppose hosted solutions like ChatGPT could offer an API where you copy some text in, and it searches its history of generated content to see if anything matches.

> bUt aCtuAlLy...

It's not like I don't know the bajillion limitations here. There are many audiences for detection. All of them are XY Problems. And the people asking for this stuff don't participate on Hacker News aka Unpopular Opinions Technology Edition.

There will probably be a lot of "services" that "just" "tell you" if "it" is "written by an AI."

link

cateye 1067 days ago

One way is trying to sneak in a specific structure/pattern that is difficult for a human to notice when reading, like using a particular sentence length, paragraph length, or punctuation pattern. Or use certain words in the text that may not be frequently used by humans etc.

Watermarking needs to be subtle enough to be unnoticeable to opposing parties, yet distinctive enough to be detectable.

So, this is an arms race especially because detecting it and altering it based on the watermark is also fun :)

link

nonethewiser 1067 days ago

> One way is trying to sneak in a specific structure/pattern that is difficult for a human to notice when reading

This seems like a total non-starter. That can only negatively impact the answers. A solution needs to be totally decoupled from answer quality.

link

thewataccount 1066 days ago

The paper I linked in the parent's comment as the "Simple proof of concept" on page 2, and like you said outlines it's limitations as both negative to performance and also easily detectable and determinable.

Their improved method instead only replaces tokens when there's many good choices available, and skips replacing tokens when there are few good choices. "The quick brown fox jumps over the lazy dog" - "The quick brown" is not replaceable because it would severely harm the quality.

Essentially it's only replacing tokens where it won't harm the performance.

It's worth noting that any watermarking will likely harm the quality to some degree - but it can be minimized to the point of being viable.

link

yttribium 1067 days ago

You can do this by injecting non visible unicode (LTR / RTL markers, zero width separators, the various "space" analogs, homographs of "normal" characters) but it can obviously be stripped out.

link

brucethemoose2 1067 days ago

Make half of the tokens (the AI's "dictionary") slightly more likely.

This would not impact output quality much, but it would only work for longish outputs. And the token probability "key" could probsbly be reverse engineered with enough output.

link

pixl97 1066 days ago

It would be pretty easy to figure out against standard word probability in average datasets. Even then the longer this system runs the more likely it is to pollute its own dataset by people learning to write from gpt itself.

link

taneq 1067 days ago

type=text/chatgpt :P

link

merlincorey 1067 days ago

Invisible characters in a specific bit-pattern.

Pretty common steganographic technique, really.

link

nonethewiser 1067 days ago

Can you elaborate on "invisible?" The only invisible character I can imagine is a space. It seems like any other character either isn't invisible or doesn't exist (ie, isnt a character).

Additionally, if I copy-paste text like this are the invisible characters preserved? Are there a bunch of extra spaces somewhere?

link

michaelt 1067 days ago

When students try to evade plagiarism detectors, they will swap characters like replacing spaces with nonbreaking spaces, replacing letters with lookalikes (I vs extended Cyrillic Ӏ etc), and inserting things like the invisible 'Combining Grapheme Joiner'

IMHO it isn't a feasible way of watermarking text though - as someone would promptly come up with a website that undid such substitutions.

link

jstarfish 1066 days ago

> IMHO it isn't a feasible way of watermarking text though - as someone would promptly come up with a website that undid such substitutions.

It doesn't matter since there's no one-pass solution to counterfeiting.

You have the right of it-- the best you can hope for is adding more complexity to the product, which adds steps to their workflow and increases the chances of the counterfeiter overlooking any particular detail that you know to look for.

link

lucasmullens 1067 days ago

There's a bunch of different "spaces", one is a "zero-width space" which isn't visible but still gets copied with the text.

https://en.wikipedia.org/wiki/Zero-width_space

link

pixl97 1066 days ago

And the second site students will go to is zerospaceremover.com or whatever will show up to strip the junk.

link

philipov 1067 days ago

So, all I have to do is copy-paste it into a text editor with remove-all-formatting to circumvent that?

link

mepian 1067 days ago

If it's generated by a SaaS, the service could sign all output with a public key.

link

meandmycode 1067 days ago

This isn't a watermark though, the idea of a watermark is that it's inherently embedded in the data itself while not drastically changing the data

link

csmpltn 1066 days ago

Why is this comment being downvoted?

OpenAI can internally keep a "hash" or a "signature" of every output it ever generated.

Given a piece of text, they should then be able to trace back to either a specific session (or a set of sessions) through which this text was generated in.

Depending on the hit rate and the hashing methods used, they may be able to indicate the likelihood of a piece of text being generated by AI.

link

pixl97 1066 days ago

Why would they want to is my question. A single character change would break it.

Then you have database costs of storing all that data forever.

Moreso, it's only for openAI, I don't think it will be too long before other gpt4 level models are around and won't give two shits about catering to the AI identification police.

link

csmpltn 1066 days ago

> A single character change would break it.

That depends on how they hash the data, right? They can use various types of Perceptual Hashing [1] techniques which wouldn't be susceptible to a single-character change.

[1] https://en.wikipedia.org/wiki/Perceptual_hashing

> Then you have database costs of storing all that data forever.

A database of all textual content generated by people? That sounds like a gold mine, not a liability. But as I've mentioned earlier, they don't need to keep the raw data (a perceptual hash is enough).

> won't give two shits about catering to the AI identification police

I'm sure there will be customers willing to pay for access to these checks, even if they're only limited to OpenAI's product (universities and schools - for plagiarism detection, government agencies, intelligence agencies, police, etc).

link

hackernewds 1067 days ago

So other text should not be tagged as AI generated?

link