Hacker News new | ask | show | jobs
by yorwba 2989 days ago
All lossy compression algorithms hallucinate. That's the whole point: reducing image size by dropping some information and then hallucinating a plausible replacement to decompress.

The only difference is that this compression is better at hallucinating, so you don't get ringing artifacts or blocks, but some internally consistent alternate reality.

If you don't want to lose data you should not use lossy compression at all. JPEG can erase the distinction between digits as well.

2 comments

Okay, but still, some kinds of changes are better than others. They should probably start testing for this in the visual perception tests: is lost information greyed out in a visible way? Are words and digits always fuzzed, or replaced?

Because it turns out that fuzziness and compression artifacts have a higher-level meaning: when you see them, you know something has been lost. That's an important (if inadvertent) signal. We need to make sure the artifacts don't go away.

I can kind of see regulations coming that make neurally compressed image or video data are require to have a little Ⓝ on-screen graphic in one of the corners, in addition to (not necessarily perceivable) watermarks that can make even small crops of the image identifiable as neurally generated. And that is probably the best case.

In the worst case (and more likely?), we are going to ban computational substrates large enough to perfectly forge important data altogether because it will be too easy to misuse. We‘d essentially go back to ~1960s electronics to have at least halfway functioning mechanisms of creating social trust, namely high-bandwidth personal interactions where every thought and every action has a high chance of leaving a trace in the real world and thus contributing to someone’s reputation. No blockchain and no other technology can create nearly as much trust as that without being highly prone to misuse.

It could be a problem if it hallucinates the wrong license plate number at a crime scene. If all you want is a gigantic 8K resolution stock photo of a woman holding her baby in front of a laptop without devouring 10 MB of the user's data cap, it may be fine if the woman has a slightly different (but still highly detailed) hair style.
It would always decompress the same way, so for artistic purposes, if you preview it and it looks good, it is good.

But if you're using photography to look at things in the world, that's a whole different story.

If fuzziness and compression artifacts are a signal, it makes sense that eliminating them would reduce filesize.
There is a difference between dropping data, and replacing it with fake one. Unreadable text is better than scanned cheque with different amount.