| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pksebben 1220 days ago

this is the crux of my issue with the term "compression" in this context: Is it a smaller version of the data?

Yes, the model is smaller than the total input data. But when it comes to recreating a single image, how many of the weights must be configured 'just-so' recreate an image enough to call it the same image? I'll admit ignorance here - but I also don't think that this is a thing anyone knows for sure. We can only just extract othello piece colors from a simplified, othello-specialized model designed to recognize two colors.

How much of the information from other images must be present to perform this task?

My instinct, given my understanding of how these things work, is that to replicate an image with any recognizable fidelity, you have to overtrain the model enough that you've affected a set of weights much, much larger than the pixel data. The internal representation of these images is concerned with much more visual information than just 'this pixel is this color' - by looking at layered outputs from the inverse type of system (image recognition, which is the core component of these models), you can see that they're encoding layers of shading, lines that map to brushstrokes or object boundaries, foreground, background, all kinds of stuff. A direct representation of an image with all of these would be necessarily huge - and we know this because we have them. Artists use layers in all kinds of image-creation software, and they're always way bigger than the JPEG itself.

I get that this may sound pedantic, but the term 'compression' doesn't seem, to me, that it fits here. Compression, by definition, makes stuff smaller

1 comments

temp512345 1213 days ago

Fair enough, maybe compression is a too specific term to apply here but I does not matter if it's compression or not to violate copyright. Compression was a good example to mention because it is already familiar to laypeople and established law. The main point is that it stores some sample of the original data - and if it's more it is derived from the original data (your strokes example) and applying some algorithm to reconstruct it to some approximation that we humans might find indistinguishable

link