Hacker News new | ask | show | jobs
by KolmogorovComp 900 days ago
Whenever I see stories like that I always wonder if anyone has succeeded at parsing an undocumented file format that included custom compression scheme.

Parsing a binary file is tedious but you can progress steadily at least, whereas you would never be sure you even decompressed correctly, before even trying to decode the format.

Fortunately this is mostly a theoretical problem. There are very few cases where a custom compression would be more efficient than slapping a .zip/.zstd/.tar on it if it ever goes too big.

1 comments

There was a story about reverse engineering highspeed Broadcom networkcard firmware on HN last week. That included a custom compression if I remember correctly.

This is the post: https://news.ycombinator.com/item?id=38772862

Thanks for the link! Impressive work indeed. Relevant snippets

> but had no idea as to how the image was compressed. It clearly wasn't compressed with any common compression algorithm. Mercifully unlike the MIPS firmware, it had at least a few strings, which is how I was able to tell it was compressed; a hex dump showed chunks of human-readable text with garbage interrupting them.

> A hunch. After extensive amounts of time trying and failing to eyeball the compression algorithm from hexdumps of compressed code, and trying any decompression algorithm I could think of against it,

But they eventually could break through by reverse engineering the decompression code.

> Once I finally had a concise, sane description of the decompression algorithm in C, the algorithm turned out to be hilariously simple. I was also then able to figure out the origins of the compression algorithm; it's called LZSS