Hacker News new | ask | show | jobs
by cxr 1731 days ago
I recommended exploring this approach here <https://news.ycombinator.com/item?id=27998733>:

> Hot tip for handling office file formats or anything that uses a ZIP container: just unzip them and commit _that_ to the repo.

Even modern (zipped XML-based) office file formats do make some limited use of binary blobs. You can either keep these intact, or write a small objdump-like tool that serializes them to text†. For portability, it might be best to write the serializer/deserializer in JS dumped into a thin HTML wrapper, so you pretty much anyone can double click to "run" it. (My experiments on roundtrippability with including that file in the ZIP container yielded poor results.)

† I've used this strategy for Oberon .rsc binaries. Due to Wirth's affinity for single-pass compilers, the Oberon toolchain doesn't involve a discrete assembler or AOT linker tool, so there is no assembly format or linker scripts. However, Wirth's distribution of the Oberon system does have an ORTool utility <https://people.inf.ethz.ch/wirth/ProjectOberon/Sources/ORToo...> (in the vein of objdump/readelf/nm) that will dump a textual description of the binary you give it. I realized that with some slight tweaks, you can use the output of ORTool.DecObj as a de facto "assembly" format—just write a tool capable of parsing it and then write out the corresponding binary.

1 comments

>> Hot tip for handling office file formats or anything that uses a ZIP container: just unzip them and commit _that_ to the repo.

What is the point if that? I think neither binary nor XML output would be meaningful in the diff output.