| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chenxiaolong 1019 days ago

What I don't understand is the use case for data descriptors. If the entire file is available, then the central directory is sufficient. If the file is being streamed, then there's no way to know with certainty where the data descriptor is, especially for a scenario like an uncompressed zip stored within a zip.

I suppose a parser could try to compare the CRC and uncompressed size fields every time it encounters the data descriptor magic bytes until it finds one that matches, but the magic bytes aren't even mandatory. The data descriptor can just be CRC + compressed size + uncompressed size with no marker.

From basic testing, `unzip`, libarchive/bsdtar, and the Python zipfile module all choke on files like this when read from a stream (as expected).

Why were data descriptors ever included in the spec?