Hacker News new | ask | show | jobs
by ke88y 1069 days ago
> There isn't space for invisible watermarks by design.

Very incorrect. Steghide [1] supports JPEG. JPEG and other lossy image formats are ultimately just fancy file formats; there's nothing preventing you from encoding arbitrary messages in a compressed image.

> You can kind of get away with something "at the edge" that survives an initial JPEG encoding, but there's no way it's going to reliably survive e.g. resizing, cropping, and recompressing and still remain invisible.

I am pretty sure that I can design steganagraphy algorithm that disperses a small message across a JPEG in a way that is:

1. invariant to resizing (absolutely certain this is possible),

2. robust to cropping (invariant to cropping up to some limit is definitely possible; eg if you crop 100% of the image then obviously everything goes out the window),

3. robust or even invariant to recompression. This seems a lot harder but I'm pretty sure it's possible.

> Also, most AI generation content is presumably going to be text, not images. Good luck watermarking text that's a paragraph long. (There are potential tools that can operate on text the size of a news article, but are also trivially defeated by swapping a few prepositions and synonyms.)

Yeah, text seems more difficult. Images are also difficult/impossible if you assume the model user is adversarial and competent, which I'm not sure what you wouldn't assume.

For any particular model you can probably do detection with a fair bit of inaccuracy. But I would definitely put detection in the "doomed" category.

I also think the threat is real but wildly over-stated relative to the non-AI status quo. We're slightly democratizing Photoshop and copywriting skills, which weren't exactly scarce to begin with. It's not an AI problem, and it's barely a technology problem. It's primarily a political problem.

[1] https://github.com/StefanoDeVuono/steghide

1 comments

> 3. robust or even invariant to recompression. This seems a lot harder but I'm pretty sure it's possible.

No, that's my main point. By definition, "perfect" compression will discard everything not human-noticeable, which leaves no room for watermarks/steganography. So the only room for watermarks is in the margin where compression is currently imperfect, i.e. encoding more detail than needed.

But that's relying on artifacts that vary dramatically with compression technique (JPG vs PNG vs WEBM etc.), with basic image manipulation (adjusting brightness, contrast, color, etc.), and other basic operations like resizing. So as soon as you chain any of these together, watermarking falls apart.

> Steghide [1] supports JPEG.

Yes, I already said in my comment 'You can kind of get away with something "at the edge" that survives an initial JPEG encoding'. But as I'm saying, it's not robust or reliable as images get reused. The whole point of a watermark is that it survives copying -- e.g. they would show up as dark text if you xeroxed a watermarked document. That type of robustness or reliability is just not possible here as users download and re-upload images that get re-encoded, because the entire point of image compression is to try to throw away anything and everything the human eye doesn't care about.

> perfect

I think you're being distracted and confused by imprecise and inaccurate descriptions of the intent of various algorithms, instead of considering what actual algorithms actually do.

No existing compression algorithm was designed to be "perfect" in your sense of the word, and none are. They were designed to be good enough, under a lot of different constraints (ease of implementation, extant mathematical tools/knowledge at the time, computation time, etc.)

Instead of talking in hand-wavy terms about hypothetical objects in a wishy-washy way, let's remember that lossy image formats and compression schemes are just pieces of mathematics. E.g., the basic JPEG algorithm is a fairly simple procedure that can be explained to any college student and even moderately above average middle schoolers. Is it perfect? No. Is it what actually gets used in reality? Yes.

> That type of robustness or reliability is just not possible here as users download and re-upload images that get re-encoded, because the entire point of image compression is to try to throw away anything and everything the human eye doesn't care about.

Let F : IMG -> IMG be a set of functions that most/all users and platforms use for compressing images. The question is whether there exist a pair of functions s,t such that for an image i \in IMG:

1. s(i) is roughly the same as s to the bare human eye but contains a message m. (Or not? Depends on the use-case.)

2. t(s(i)) ~= m for some notion of similarity ~= which is sufficient for watermarking.

3. for any f1, ..., fn \in F, t(fn(...f1(s(i))...)) ~= m.

We can relax constraint 1 because we probably only care about a subset of IMG, etc. etc.

Your impossibility conjecture about the existence of s,t for common extant F's doesn't seem nearly as obvious as you're claiming. And there are CERTAINLY choices of F for which s,t do exist. Eg for the basic JPEG algorithm I'm pretty darn confident I can design s,t that are robust to various parameters and also where you only need at least k pixels uncropped to recover a message ~= m, for example. And not just design it, but write a fairly short and intuitive mathematical proof explaining precisely why it works.

In fact, if you know how JPEG works and other applications of Fourier transforms (eg in acoustics and perhaps also crypto), you might see why it would be more surprising if doing this were NOT possible at least for various JPEG implementations/parameterizations!

Stepping aside from JPEG in particular, you might need to know something about how each function in F works, perhaps intimately, and there is probably some clever mathematics involved for many choices of F.

you could also view this as an optimization problem and use various tools that got really popular in 2016 or so to build quite robust solutions that don't depend on the particularities of your choice of F. I'm less certain this would give absolute guarantees but I bet you'd end up with stuff that works well in practice.

But in any case, it's unclear why you are so convinced this is impossible.