| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by 7373737373 2 days ago
	What does it compress the full 1GB file to? http://prize.hutter1.net/

3 comments

spidy__ 2 days ago

I tried it on a enwik9 100 mb slice and was able to compress it to 20 mb + 900kb transformer so 21mb.

I know the top submission was able to get it to 13 mb.

Still trying some ideas to get better compression.

link

gravypod 5 hours ago

Since you know the size of the file beforehand you may be able to overfit some kind of text diffusion model instead of a transformer? May allow you to partially correct the model output using some other method and then fill in the blanks that were wrong from previous generations.

link

spidy__ 2 hours ago

Oh, sounds interesting. I hadn't considered using a diffusion model for this. My current approach generates the document byte by byte with an autoregressive transformer, so I'm curious how a diffusion model would improve memorization or reconstruction quality.

Can you point me to something that i can read? I really wanna try this approach , diffusion model does sounds interesting for compression.

link

purple-leafy 11 hours ago

Thanks for the link!

link

cellular 5 hours ago

Maybe everyone should compress the 1st 100MB worth of digits of pi, for an apples-to-apples comparison?

Edit: oh wait that's too easy. Need to generate /publish random digits so everyone can use it.

link

saulpw 5 hours ago

random digits aren't compressible though?

link

SV_BubbleTime 5 hours ago

Random digits are compressible though.

Random data does not mean it does not match a pattern in your dictionary for example.

link

gnabgib 5 hours ago

No.. they're not. Do you understand random (the apparent or actual lack of definite patterns or predictability[0]) or compression (reduces bits by identifying and eliminating statistical redundancy[1])?

[0]: https://en.wikipedia.org/wiki/Randomness

[1]: https://en.wikipedia.org/wiki/Data_compression

link

IncreasePosts 1 hour ago

Over infinite runs, you can't compress random data, but that doesn't mean any finite string of random digits is incompressible

link

thin_carapace 4 hours ago

by this definition, a random dataset could apparently present no patterns, while presenting non apparent patterns.

link

ufocia 4 hours ago

Sounds like presenting no patterns, apparently or otherwise, would be a pattern in itself.

link

branc116 3 hours ago

Compressor: Output an empty file.

Decompressor: Take any old algorithm for finding digets of pi, find first 100M of them, print them.

Compression ratio of 0! :0

link