Hacker News new | ask | show | jobs
by hansword 1463 days ago
I love this and would have dearly needed it like 5 years ago. Now, it is still a very interesting read.

But given what we have already seen from Nvidia on video compression [0], I think within the next few years, we will move everything to machine-learning-'compressed' images (aka transmitting a super-low-res seed image and some additional ASCII and having an ML model reconstruct and upscale it at the client side).

[0] https://www.dpreview.com/news/5756257699/nvidia-research-dev...

3 comments

Most images are still JPEG (3 decades old) or PNG (2.5 decades old). Countless better formats have been developed, but with the exception of WEBP we are still using the same image formats that existed during the dot-com bubble. Ubiquity trumps improvements in image size.

Better encoders for JPEG or PNG are the main avenues how you can achieve improvements without compatibility problems, and I think that will stay true for another decade, if not more.

I fear that we will get used to some image patterns feeling right instead of reality if AI gets involved too much.
honestly, this scares this shit out of me.

lossy compression is one thing, but to just say that an ML model suggests making pixels like this vs a mathematical formula is totally different things.

What if it's an ML model suggesting parameters for jpeg? It could still hallucinate to some degree, but it's also more limited.
Image -> mathematical forumla to toss data -> reverse formula -> slightly altered image

vs

Image -> mathematical formula to toss data -> ML to recreate what it thinks is supposed to be there -> made up image based on "training" data not even from original image

that's my problem

When you think "ML to recreate what it thinks is supposed to be there" you probably automatically go to DeepDream or https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

but the end result doesnt have to be direct output of ML hallucination. AI encodes probability distribution, you can treat it as motion compensation in video codecs - what comes next is a convolution by encoded error between predicted outcome and ground truth.

So how is that different than motion estimation as it currently stands. That at least sees where pixels are and then where they will be. So instead of storing all of that data, just store where they start and then end and then tween the diff. Isn't that what this "new" ML you just describe does but "different" by slapping "trained ML/AI" to it?
the difference is that the better you can predict the motion, the less data you have to store, and ML models are much better than hand tuned heuristics at predicting motion. It's no different than the recent use of ML for chess programs. The search techniques remain pretty similar, but neural networks are often much better at evaluation of objective criteria than hand-coded heuristics.
ML based image compression don't generally let the net make up data, they use a net as a prior to reduce the entropy of the data that's there.
What I'm saying is I'm not sure how much ML can imagine just by changing coefficient precision.