Hacker News new | ask | show | jobs
by ant6n 1223 days ago
Mmh, compression ratio doesnt seem super high. I wonder how the values are stored, perhaps floats? Some thoughts:

Maybe using yuv would be better than rgb?

Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers.

Use some generic compression on top.

Is there a smart way to store the basis vectors? Something about angles somehow (analogue to quaternions)? Also, once U is known, isnt the inverse implied? Also, if U is approximated and fixed, perhaps the singular values can be adjusted again to minimize errors.

2 comments

> Is there a smart way to store the basis vectors?

Consider, instead, compressing small blocks of the image instead of a whole image plane at once. Let's say 8x8.

Over such a small region, pixels are usually very highly correlated. You could model them statistically as an order-1 autoregressive process with a high correlation coefficient (e.g., x[i + 1] = x[i]*rho + noise, with rho >= 0.95 usually, separately in both the horizontal and vertical directions).

Then, computing the eigendecomposition of the 8x8 cross-correlation matrix C_{ij} = rho^|i - j| produces a set of common eigenvectors that you can use for all blocks (recall that the left and right singular vectors U and V of a matrix M are just the eigenvectors of M.M^T and M^T.M, respectively). So now instead of several very large vectors, you only need a single 8x8 matrix of basis vectors (because it's square and orthonormal, the inverse is just the transpose).

But wait. Let's take the limit as rho -> 1. It just so happens that the resulting eigenvectors converge to the Type 2 Discrete Cosine Transform [0]. So, in fact, you don't need to transmit the basis at all.

You can store the output of the transform as fixed point numbers with some arbitrary base (the choice controls the amount of compression: anything too small just gets rounded to zero). Maybe use run-length encoding as your generic compression, because you expect to have a lot of zeroes (visit them in a zig-zag order, to get the big values first and maximize the length of the runs of zeroes), with some Huffman coding of the actual symbols. Add a simple predictor for the (0,0) coefficient of each block (it winds up being the average of the pixel values, so it should be pretty similar from block to block).

Congratulations, you just invented JPEG.

[0] http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1672...

I’m guessing there’s just a lot left on the table—

> Maybe values could be represented as 0..1, and the values itself could be stored as 0.16 fixed point numbers. Use some generic compression on top.

Codecs like JPEG use a variable amount of quantization. You quantize by scaling the 0..1 float by some scalar, putting it in the range 0..k, and then truncating it to an integer, and encoding the integer. The integer value is encoded using an entropy coder like Huffman. The parameter k must also be encoded somehow, or fixed.

Look up “JPEG coefficient quantization” for how JPEG does it.

Codecs for audio and images are often made up of understandable parts that fit together: transformations, quantization, entropy coding. If you come up with a new transformation, you can make a whole codec by putting together the remaining pieces—but something like a new transformation is itself interesting, because somebody else can always assemble it into a codec if it shows promise.

> Codecs like JPEG use a variable amount of quantization

The reason quantization works so well in JPEG is because of the DCT step and its energy compaction properties. This gets most of the coefficients near zero. I think without this transform you would be introducing a lot more noise in the final result.

At some point, we are going to end up re-implementing a thing approximating jpeg here. Colorspace convert, subsampling & DCT+quantization is most of the magic sauce.

The singular value decomposition is doing the same thing as the DCT, sort of. It’s transforming the image into coefficients where the important coefficients are packed in one location. It’s worth applying similar techniques. You can see the demo where it’s truncating the coefficients.

Half the image codecs out there look like JPEG if you squint hard enough. That’s ok, “reimplementing a thing approximating” another codec is how a lot of marginal improvements have been made in the past, and if you add up enough marginal improvements, you end up with a competitive new codec in its own right.

The DCT isn’t magic, it just happens to work really well and requires little computational power. There’s a lot of possibilities for replacing the DCT with something else, especially now that we have more computational power to throw around.