|
this is the crux of my issue with the term "compression" in this context: Is it a smaller version of the data? Yes, the model is smaller than the total input data. But when it comes to recreating a single image, how many of the weights must be configured 'just-so' recreate an image enough to call it the same image? I'll admit ignorance here - but I also don't think that this is a thing anyone knows for sure. We can only just extract othello piece colors from a simplified, othello-specialized model designed to recognize two colors. How much of the information from other images must be present to perform this task? My instinct, given my understanding of how these things work, is that to replicate an image with any recognizable fidelity, you have to overtrain the model enough that you've affected a set of weights much, much larger than the pixel data. The internal representation of these images is concerned with much more visual information than just 'this pixel is this color' - by looking at layered outputs from the inverse type of system (image recognition, which is the core component of these models), you can see that they're encoding layers of shading, lines that map to brushstrokes or object boundaries, foreground, background, all kinds of stuff. A direct representation of an image with all of these would be necessarily huge - and we know this because we have them. Artists use layers in all kinds of image-creation software, and they're always way bigger than the JPEG itself. I get that this may sound pedantic, but the term 'compression' doesn't seem, to me, that it fits here. Compression, by definition, makes stuff smaller |