Hacker News new | ask | show | jobs
by javajosh 3940 days ago
Everyone's focusing on this being a PNG problem but actually if my server unzips a 420 byte file into a 5M file of any kind, I'd say that's the first red flag. Assuming some sort of streaming decompression, you could write an output filter that shuts off the decompressor when it's seen a factor of X bytes. A reasonable factor would be 10 - which in this case would have halted bzip decompression at 4kB.

This would probably be a trivial patch to bzip2. But I like the idea in general of passing an "max input/output ratio" to any process or function that might yield far more output than input.

1 comments

The real problem is image handling libraries that blindly render images into too-large objects where unnecessary. While full-res uncompressed images are very convenient under the hood, the image library should inherently handle anything "too big" gracefully. Instead we're often prone to apps crashing when someone feeds in a ridiculously large image.

A 420B > 5MB expansion should not be a "red flag" because there is nothing about it (including the subsequent attempt to process a 141GB uncompressed image) which cannot be handled appropriately in software. Flagging such ratio limits is arbitrary, and setting an arbitrary limit is usually a sign the software is incorrect, not the data.

A ratio limit is a hueristic.

There is an upper-limit to how much information you can compress into a given space. (Note that we may want to write a pathological program that is very small and allocates a lot of information-free memory. But that's not decompression.)

If we accept the premise then we can look at another approach to solving this problem, once and for all! I like examining memory allocation because it's so general. But there may be another way. We can examine the input to estimate compression ratio.

The problem here is that image decompression is apparently giving strangers the ability provide an arbitrary N and say "Please loop N times and/or allocate N bits". A modern CPU is overwhelmed by an N 12 bits long or longer. This is a root cause of many problems! You know, I'm going to go out on a limb here and make a bold assertion: I assert there is a very safe upper bound on the decompression ratio, and that for any real algorithm you can indeed examine the input to determine whether N exceeds your allowable threshold. 10x might be a bit low (although I doubt it) so let's be generous and say 100x. (Which seems crazy. Nothing that I know of, not even text, compresses that well.) This means that I believe that any image format, for example, has a trivially calculable N (for example, width*height in pixels). I would argue that in the general case (unless you are doing some sort of compsci research) the image file should be related to N. That is if the image is 10 bits wide, 10 bits high, we should expect a roughly 20bit file-size.