Hacker News new | ask | show | jobs
by DanBC 3949 days ago
That's impressive. Here are some other compression curiosities.

http://www.maximumcompression.com/compression_fun.php

A 24 byte file that uncompresses to 5 MB; another file with good compression under RAR but almost no compression under ZIP; and a compressed file that decompresses to itself.

5 comments

This isn't a decompression bomb, but here are some fun virtual disk images I found using AFL fuzzer. One of the files is 329 bytes, but causes qemu to consume 4 GB of heap trying to process it. This has interesting consequences for the public cloud, where people can upload any old stuff and it is usually processed immediately by 'qemu-img'.

https://bugs.launchpad.net/qemu/+bug/1462949

(I have a big collection of these, but most of the bugs have now been fixed in qemu)

The file that decompresses to itself is a work of art. How does one even go about to create something like this?
Like this: http://research.swtch.com/zip. Fascinating read.
Written by THE Russ Cox?
I have no idea.

There are people who take special interest in compression. The Hutter Prize would be a good place to find them. http://prize.hutter1.net/

It's a bit old now.

> Restrictions: Must run in <10 hours on a 2GHz P4 with 1GB RAM and 10GB free HD

I'd be interested to see what can happen without those restrictions.

> I'd be interested to see what can happen without those restrictions.

If you did not have time limits you could exhaustively search through Pi looking for your data, then store just the offset into Pi.

Or if the offset got too large then Pi to the power of lots of random numbers. One of those numbers will, somewhere, have your data. But it's infeasible to search for it.

A common misconception.

There is no universal compression algorithm, by the pigeonhole principle.

And your compression scheme won't work by the simple fact that the offset of where in Pi your data lies will, on average, take the same number of bits to store as the data itself. Ditto, with pi to the power of lots of random numbers, the offset and the power will, on average, take the same number of bits to store as the data itself.

I'll add mine: https://brage.info/hello

It's a 1MB file that decompresses to 261 tredecillion bytes of "Hello, World".

No terribly clever stream manipulation; it's a perfectly normal gzip file, other than the size. The generation script is here: http://sprunge.us/VhFc, but see if you can figure it out without peeking.

Warning to my fellow chumps: that is a direct link to the gzip bomb. Clicking on it causes a download, which results in Windows Defender promptly losing its mind as it tries to decompress the whole thing looking for viruses. (win 8.1)

Interestingly enough, it's not using up any memory, but it is hitting the disk at 55MB/s, so I'm guessing it's decompressing it to disk, and will eventually crash when it fills up my hard drive.

EDIT: It gave up after fifteen minutes. That's nice, now I don't have to reboot. There was about 2 gigabytes of stuff in my temp folder when I ran disk cleanup, don't know how much of that was gzip bomb output.

May I suggest for the merely curious:

    curl https://brage.info/hello | od -c | head -40
Is that short or long scale tredecillion?
Actually, it decompresses into a 1.4MiB file, which decompresses into another 1.4MiB file, recursively.
I sized it to fit on a floppy disk. :-3

But, no. There's a finite number of levels, and it blows up pretty quickly.

Malware scanners that search inside archives really love zips that decompress recursively forever.
Here's a fascinating thought.

It should be possible to write an archive file that decompresses to a slightly different archive file. Say, the original archive file plus a zero byte. (Things get hairy because of the checksum in most archive file formats, but it still should be possible.)

Then a malicious party writes an archive file that decompresses to two slightly different archive files. And sets it up so that, say, 128 levels in iff you follow a specific path in said tree you get to the actual malicious content. You can't get to it unless you follow the specific path, and you can't find it unless you do substantially more than just naively decompressing files (as you won't be able to decompress 2129-1 files).

Maybe someone takes the recursive file to a new level and creates one that generates to n new identical files. So decompress to a tree.
Better yet: a file the decompresses to multiple slightly different versions of itself. Say, one that is a version of itself with an additional one, and one that is a version of itself with an additional zero.