Hacker News new | ask | show | jobs
by dj_mc_merlin 2031 days ago
> I started working on DwarFS in 2013 and my main use case and major motivation was that I had several hundred different versions of Perl that were taking up something around 30 gigabytes of disk space, and I was unwilling to spend more than 10% of my hard drive keeping them around for when I happened to need them.

It fills me with joy that someone has been coding a fs for 7 years due to perl installs taking too much space. Necessity is the mother of all invention.

4 comments

Hahaha, I haven't actually been coding on this for that long, it's more that I coded for a few weeks back in 2013 and only found the motivation to resurrect the whole thing a few weeks back.
Funny thing about it is that I've got a similar problem powering https://perl.bot/ (and the associated irc bot). I don't have as many installs as you currently but It's not far off and I want to add more compile time settings to them. I'd need to setup a full build server/system though because I need to regularly update them with new modules.

How opposed would you be to this being reworked to being able to be mainline kernel support too?

> How opposed would you be to this being reworked to being able to be mainline kernel support too?

I don't see any way of getting this anywhere near the kernel without a full rewrite. It's C++ and it depends on libraries that aren't even shipped by a lot of distributions (folly & fbthrift). And, tbh, I don't see much benefit given that FUSE these days doesn't seem to be significantly worse in terms of performance.

> I'd need to setup a full build server/system though because I need to regularly update them with new modules.

Overlay the mounted read-only fs with a read-write fs. Then you can install modules as you like and if you want to start fresh, just throw away the read-write fs. That's what I've done in the past.

Is it possible to rebuild a DwarFS fs to incorporate changes from an overlay fs without decompressing, then recompressing?

It seems feasible that a second DwarFS fs could be built from an overlay/DwarFS, then delete the original overlay/DwarFS fs. That would require 2N storage as the new DwarFS is being built. Is it possible to patch an existing DwarFS?

By overlay, are you referring to overlayfs [0]?

https://wiki.archlinux.org/index.php/Overlay_filesystem

> Overlay the mounted read-only fs with a read-write fs. Then you can install modules as you like and if you want to start fresh, just throw away the read-write fs. That's what I've done in the past.

It would be nice to be able to build a new read-only filesystem in incremental mode: given a compressed filesystem and some new uncompressed data, incorporate the uncompressed data without completely re-doing all the work.

> "taking up something around 30 gigabytes of disk space, and I was unwilling to spend more than 10% of my hard drive"

I imagine these days you have more than 300GB hard disk space, making this all moot?

256GB SSDs are still everywhere.
Nowadays you can have the same problem with Python and Javascript too!
Same with Ruby (Gems).
I have about the very same problems as mhx, several hundreds of huge perl versions which are almost the same, taking up enourmous amounts of diskspace. E.g. I had to move most of them from my SSD to a spinning disk. I really want to move them back.

Thanks to mhx I can move them now back to my fast disk. This is also perfect for testers.

If they're almost the same, could you use one git repo with different branches for each version? Or archive them with restic into a folder and restore which one you need each time. Either method should deduplicate data if they're mostly the same file structure and content.

Edit: You could even have several read only shadow copies of the repo for parallel working directory usage, if your hard link the .git directory except for the HEAD ref in each.

nice, wonder how this compare with MongoDB compression of files and objects. Seems like a great foundation for archiving data.