lossy compression is one thing, but to just say that an ML model suggests making pixels like this vs a mathematical formula is totally different things.
Image -> mathematical forumla to toss data -> reverse formula -> slightly altered image
vs
Image -> mathematical formula to toss data -> ML to recreate what it thinks is supposed to be there -> made up image based on "training" data not even from original image
but the end result doesnt have to be direct output of ML hallucination. AI encodes probability distribution, you can treat it as motion compensation in video codecs - what comes next is a convolution by encoded error between predicted outcome and ground truth.
So how is that different than motion estimation as it currently stands. That at least sees where pixels are and then where they will be. So instead of storing all of that data, just store where they start and then end and then tween the diff. Isn't that what this "new" ML you just describe does but "different" by slapping "trained ML/AI" to it?
the difference is that the better you can predict the motion, the less data you have to store, and ML models are much better than hand tuned heuristics at predicting motion. It's no different than the recent use of ML for chess programs. The search techniques remain pretty similar, but neural networks are often much better at evaluation of objective criteria than hand-coded heuristics.