Hacker News new | ask | show | jobs
by pixelglow 3477 days ago
Mark Adler, he of zlib/gzip/Info-Zip fame, seems to think that zips cannot contain arbitrary data before and between individual files.

http://stackoverflow.com/a/12393597/60910

Therefore the straightforward way to parse a zip file is to proceed from the beginning and parse out each file sequentially. The End of Central Directory record is then only a redundant convenience to avoid sequentially scanning files e.g. in large zips for random access.

1 comments

Interesting. It makes sense to be strict when parsing stream input. Skipping the redundant central directory section hadn't even occurred to me. Bonus points for eliminating the stupid comment confusion dilemma!

On second thought, not entirely redundant, as the central directory does contain the file permissions. But those can be parsed and set after file extraction, without increasing the overall memory complexity.