|
|
|
|
|
by GrantS
3623 days ago
|
|
Impressive work, Daniel! Do I understand correctly that any image prediction for which the deltas are smaller in absolute value than the full JPEG/DCT coefficients would offer continued compression benefits? As in, if you could "name that tune" to predict the rest of the entire image from the first few pixels, the rest of the image would be stored for close to free (and if not, it essentially falls back to regular JPEG encoding). If that's the case, then not only could we rely on the results of everything we've decompressed so far to use for prediction (which is like one-sided image in-painting), but we also could store a few bits of semantic information (e.g. from an image-net-based CNN, from face detection) about the content of the original image before re-compression, and use that semantic information for prediction as well via some generative model. All of this would obviously be trading computation for storage/bandwidth, but it this seems like an exciting direction to me. Again, nice work. |
|
As for having the mega-model that predicts all images better: well it turns out with the lepton model out you only lose a few tenths of a percent by training the model from scratch on each images individually. We have a test case for training a global model in the archive (it's https://github.com/dropbox/lepton/blob/master/src/lepton/tes... ) That trains the "perfect" lepton model on the current image then uses that same model to compress the image (It's not meant to be a fair test, but it gives us a best-case scenario for potential gains from a model that has been trained from a lot of images) and in this case it doesn't gain much, even in a controlled situation like the test suite.
However the idea you mention here may still be a good idea for a hypothetical model--but we haven't identified that model yet.