Since you know the size of the file beforehand you may be able to overfit some kind of text diffusion model instead of a transformer? May allow you to partially correct the model output using some other method and then fill in the blanks that were wrong from previous generations.
Oh, sounds interesting. I hadn't considered using a diffusion model for this. My current approach generates the document byte by byte with an autoregressive transformer, so I'm curious how a diffusion model would improve memorization or reconstruction quality.
Can you point me to something that i can read? I really wanna try this approach , diffusion model does sounds interesting for compression.
No.. they're not. Do you understand random (the apparent or actual lack of definite patterns or predictability[0]) or compression (reduces bits by identifying and eliminating statistical redundancy[1])?
I know the top submission was able to get it to 13 mb.
Still trying some ideas to get better compression.