Hacker News new | ask | show | jobs
by comex 2984 days ago
Last time this came up on HN, I did some research, and discovered that lzip was quite non-robust in the face of data corruption: a single bit flip in the right place in an lzip archive could cause the decompressor to silently truncate the decompressed data, without reporting an error. Not only that, this vulnerability was a direct consequence of one of the features used to claim superiority to XZ: namely, the ability to append arbitrary “trailing data” to an lzip archive without invalidating it.

Like some other compressed formats, an lzip file is just a series of compressed blocks concatenated together, each block starting with a magic number and containing a certain amount of compressed data. There’s no overall file header, nor any marker that a particular block is the last one. This structure has the advantage that you can simply concatenate two lzip files, and the result is a valid lzip file that decompresses to the concatenation of what the inputs decompress to.

Thus, when the decompressor has finished reading a block and sees there’s more input data left in the file, there are two possibilities for what that data could contain. It could be another lzip block corresponding to additional compressed data. Or it could be any other random binary data, if the user is taking advantage of the “trailing data” feature, in which case the rest of the file should be silently ignored.

How do you tell the difference? Simply enough, by checking if the data starts with the 4-byte lzip magic number. If the magic number itself is corrupted in any way? Then the entire rest of the file is treated as “trailing data” and ignored. I hope the user notices their data is missing before they delete the compressed original…

It might be possible to identify an lzip block that has its magic number corrupted, e.g. by checking whether the trailing CRC is valid. However, at least at the time I discovered this, lzip’s decompressor made no attempt to do so. It’s possible the behavior has improved in later releases; I haven’t checked.

But at least at the time this article was written: pot, meet kettle.

3 comments

The maintainer's response when I reported this bug was 'Just use "lzip -vvvv" to see the warning':

https://lists.debian.org/55C0FE82.7050700@gnu.org

Their advocacy in this thread was so good that I removed lzip from my system.

It's that an implementation problem? I would expect a decompressor to warn that there's unidentified trailing data and perhaps dump it out as-is. After all, even if you did put it there on purpose, surely you still want it, not to have it discarded.
If the claims in the article are true who cares if the competing thing that the author is working on is also shit (but good to know that too).