Hacker News new | ask | show | jobs
by hultner 1177 days ago
Very cool and probably a fun exercise, but I would probably put the data on a ZFS volume with dedupe instead, which from reading this implementation is pretty much the same thing to my layman’s perspective. I could also add compression to the same dataset.
3 comments

> I would probably put the data on a ZFS volume with dedupe instead

But doesn't ZFS dedupe eat RAM for breakfast ? Double-digit GB RAM per TB data IIRC ?

The DDT (deduplication table) used to require ~1GB RAM for every 1TB data written over the life of the file system. Deleting data from the file system wouldn't remove the dedupe references, you'd have to recreate the pool entirely and start over with a new DDT.

However, there are now special devices that can be used stored to store DDT. Typically this is done with two SSDs configured as a mirrored vdev for the DDT metadata. This reduces the overhead on memory, but does cost some performance and still has the same limitation that the DDT size can only be reduced by re-creating the pool.

Yes this is what I do, and I wouldn't have dedupe for all datasets, for instance in this case he wanted dedupe on his games, so I would just enable it for that dataset.

But I do run disk-wide compression with no problems and have done so on all my datasets for many years now, and it's been a tremendous space saver. Especially on machines where I have a lot of VMs/containers, it's not unusual for me to have a compression ratio of 2 on these with good old lz4, it will be interesting to see what damage zstd will do once I start experimenting with that.

+1 for compression! Unless you have very specific performance needs there's no reason to not to use compression. In fact TrueNAS (previously known as FreeNAS) has it enabled by default.

I've yet to compare lz4 to zstd myself, but I've read great things about it.

No, only for deduplication, which is optional and doesn't make sense for most workloads. And usage depends on record (like block) size and other factors. Usually 1GB per TB of deduplicated storage is the max. ZFS works on datasets for configs like this, so you can dedup a small "filesystem" of data but not all 40TB on a pool.

It will use lots of RAM for ARC (adaptive read cache) but that can be limited.

ZFS RAM usage is greatly overblown in my opinion and experience.

Agreed. People forget how much more RAM we have now versus when ZFS was developed. It's like old emacs swapping jokes.
It's usually along the lines of 1GB per TB. Some factors can affect that number but I've found it pretty accurate. Note that's 1GB of RAM that ends up wired to ZFS and never usable by the rest of the system.
My recent 'copy-on-write' foot gun was using ZFS in ubuntu. There seems to be some automatic config that makes zfs snapshots each time you apt install something. This lead to wondering why on earth no matter how many files I deleted my disk usage was not reducing from 100%. All those deleted files still existed in the sanpshots. coupled with some bug that makes the snaps shots report their size as much smaller than they really were.
ZFS is fun but it locks up for unexplained reasons sometimes
I don't know if it's the same issue you have in mind, but I can 100% reproduce ZFS hanging and thinking a disk is unusable.

Steps to reproduce:

* Get an external HDD. I literally bought a new, different external HDD because I thought the problem was the old one (spoiler: nope, the problem still could be reproduced 100% on the new disk).

* Create a zpool for the whole disk.

* Create a dataset.

* Try to rsync several hundreds of GB to the ZFS dataset.

* Wait for a minute or two.

* Notice how it stops transferring data, and gives a weird error complaining the disk is unhealthy, faulty, or something (I don't remember the exact terms).

No amount of `zpool clear` or `zpool scrub` will fix it. I gave up and just formatted it as ext4 like all my other backup disks.

My use case for this was having this external HDD as a backup. The plan was to format this as ZFS, copy data from all my other external HDDs to this one, format the other external HDDs with ZFS, and then start rotating between them.

---

Another way to reproduce this is with torrents. When I downloaded torrents directly to an external HDD, it also hanged and got some errors, but in those cases it could be fixed with a `zpool clear` and scrubs, so it wasn't that bad (it wasn't literally unrecoverable, like the case I mention before).

---

So this leads me to believe there's something weird between ZFS, external HDDs, and trying to write too fast.

The whole point was to be able to run `zpool scrub` on those external HDDs. But like I said, I gave up on that for now. So the current plan is to try to build a NAS and do the same attempt, but with internal HDDs.

maybe failure in the connection or controller. But most likely the external disk was using smr (shingled magnetic recording), which shouldn't be mixed with zfs. there are different types of smr and some zfs-issues with smr-disks have been fixed in the past. servethehome.com has a detailed article (benchmarks) why those two technologies should not be mixed.
Any more info?

That sounds like a problem specific to the setup you were using. (?)

Saying that because if it was something that commonly happened, then either a) it would have been fixed, or b) people would have stopped using it. :)

If the specific setup is an external HDD (or maybe a very slow disk, and trying to write too fast to it), then I can make sense of parent's comment.

Like I mentioned in my sibling comment, I can 100% reproduce something that sounds like what parent mentioned (most recent attempt was like 1-2 months ago); but for my specific case, I can see how ZFS on an external HDD might not be that common.

I suggest playing with the ZFS tunings to add a write throttle. My hunch is your disk's buffer is filling and then blocking.
I think that there was an electrical problem (mismatched chargers on a laptop) in the first instance of mine.

Then there's encrypted datasets being very slow, especially when copying between two pools.

Then there's having two pools on the same disk corrupting each other. Though in this case the data might have been recoverable using some recovery tool (like a "modified zdb").

Also, I read on some website somewhere (maybe zfsonlinux.org) that USB devices probably shouldn't be used with ZFS.

Honestly I "wish" some bunch of crazy filesystem people would just clean room ZFS...

That sounds like a good idea, you just gave me new keywords to search for my next attempt.

Thanks!

Listen, people have to start using it to stop using it :=)