Hacker News new | ask | show | jobs
by sph 818 days ago
Zip is extremely simple, and well documented.

I wrote a ReadableStream to Zip encoder (with no compression) in 50 lines of Javascript.

2 comments

Me too but in php. I couldn't find a streaming zip encoder that you can just require() and use without further hassle, so I wrote one (it's on github somewhere).

The problem is that zip is finicky and extremely poorly documented. I had to look at what other implementations do to figure out some of the fields. About at least one field, the spec (from the early 90s or late 80s I think) says it is up to you to figure out what you want to put there! After all that, I additionally wrote my own docs in case someone coming after me needs to understand the format as well, but some things are just assumptions and "everyone does it this way"s, leading to me having only moderate confidence that I've followed the spec correctly. I haven't found incompatibilities yet, but I'd also not be surprised if an old decoder doesn't eat it or if a modern one made a different choice somewhere.

It's also not as if I haven't come across third party zip files that the Debian command line tool wouldn't open but the standard Debian/Cinnamon GUI utility was perfectly happy about. If it were so well-documented and standard, that shouldn't be a thing. (Similarly, customers on macOS can't open our encrypted 7z pentest report files. The Finder asks for the password and then flat-out tells them "incorrect password", whereas in reality it seems to be unable to handle filename encryption. Idk if that is per the spec but incompatibilities are abound.)

The PKWare Zip file spec is reasonably detailed.

If you're not sure what the spec is trying to say, then either the PKZip binaries or the Info-ZIP zip/unzip source code is your usual source of truth.

When one unzip works but another unzip app doesn't, then you can usually point the finger at the last zip app that modified the zip file. There's some inconsistency in the zip file.

Running "unzip -t -v" on the zip file in question may yield more info about the problem.

The binaries you refer to as source of truth are a paid product (not sure if the trial version, which requires filling out a form that's currently not loading, includes all options, or how honest it is to use that to make an alternative to their software, or if the terms allow that) and don't seem to run on my operating system. I guess I could buy me a Windows license and read the pkzip EULA to see if you're allowed to use it for making a clone, but I figured the two decoders (that don't always agree with each other) I had on hand would do. If they agree about a field, it's good enough (and decoders can expect that unspecced fields are garbage)
Info-ZIP is open source. Have you never used unzip?
Isn't pkzip the original? I'm not sure I've heard of info-zip but unzip is a command I use regularly on Debian. I highly doubt that's the original commercial implementation though
Here's the link to the PKWARE APPNOTE.TXT

https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

The only special thing about the Zip file format that springs to mind as causing ambiguity is the handling of the OS-specific extra field for a Zip archive entry.

You don't have to include an OS-specific extra field unless you want the information in that specific extra field to be available by the party trying to extract the contents of the zipfile.

Wait until you add support for encryption.