Hacker News new | ask | show | jobs
by david_draco 2031 days ago
I wish there was a semi-compressed transparent filesystem layer which slowly compresses the least recently used files in the background, and un-compresses files upon use. That way you could store much more mostly unused content than space on the disk, without sacrificing accessibility.
5 comments

I don't know about you guys, but most of the stuff that takes up space on my drives are:

1) Videos from my DSLR

2) RAW images from my DSLR

3) Various movies / TV series I downloaded

4) Game files (most of which are textures and 3D models)

None of that stuff is really compressible.

The RAWs aren't compressible? Are they LZ encoded on the camera?
RAW imagery I have worked with is about 25-50% losslessly compressible on average. Most of your gains in image compression are from quantizing the gamut in clever, (usually) imperceptible ways.

Raw imagery contains a lot of entropy that usually doesn't affect the appearance perceptually, but still has sig figs that frustrate compression.

My Fuji has lossless RAW compression and as far as I know quite a few others too.
I just tried compressing one with 7-zip ultra level compression. Saved maybe 5%. Wouldn't get even that with realtime compression.
You could probably build something easily in nbdkit to do this. (Note this is at the block layer). An advantage of nbdkit is you could write the whole thing in the high-level language of your choice, even a scripting language such as Python, which might make it easier to rapidly explore designs.

Having said that I did try to implement a deduplication layer for nbdkit, but what I found was that it wasn't very effective. It turns out that duplicate data in typical VM filesystems isn't common, and the other parts of the filesystem (block free lists etc) were not sufficiently similar to deduplicate given my somewhat naive approach.

I believe the term of art that applies here is "Hierarchical Storage Management". Along with automatically moving data between high-cost and low-cost storage media, the low-cost storage media for your filesystem of choice for the kind of compressing you described can simply be fast disk on a compressing filesystem.
I believe NT file compression works like this, and before that MSDOS "DriveSpace" ...
NTFS requires that files be manually converted to the compressed format. They're uncompressed in parts as requested, but this is only kept in RAM. I'm not aware of any built-in background task that converts files to/from the compressed format.
You can set the "Compressed" flag of a folder and from then on everything in that folder will be compressed/decompressed transparently. I have most of my disk compressed that way and never have seen problems.
That's cool, I hadn't thought of that. But does it identify "hot"/"cold" files that might benefit from being converted automatically to/from the compressed format? That would be a very nice feature to have.
Checkout CVMFS

It is not what you describe but it can help.