Hacker News new | ask | show | jobs
by tobias3 5407 days ago
I tested it and I don't recommend it. (It was like a year ago though) It was really slow and some blog posts about the reliability of the data storage backend were a little bit scary.

I would recommend using zfs-fuse. You don't have the FUSE->File on a filesystem->Hard disk indirection (thus more speed). And additionaly you get all the cool ZFS features! If you need even more speed there is a ZFS kernel module for linux and a dedup patch for btrfs. I don't think those are production ready though.

2 comments

I tried ZFS dedup but there was something like a 20x slowdown to write files compared to ZFS without dedup, and this was on under ten gigabytes of files. I don't know if I somehow had the cache settings wrong or what the problem was, but I didn't manage to fix it, even trying both FUSE and kernel versions. (On ubuntu 11.4)
Yeah random acess on hard disks is awfully slow. And if you have dedup you can cause lots of random access... If you have a little bit of data the hashtable used for dedup can also be to big to fit into memory. Then ZFS puts it onto the disk and it is even slower. Luckily there is a feature to use SSDs as a cache device in this case.
The tricky part seems to be 'too big to fit into memory'. From what I understood and calculated the dedup tables on my system should have been well under 100MB, and the amount of memory designated for metadata was over 350MB, yet the performance was terrible.
Based on my testing (not published anywhere, sorry) ZFS dedup works best when you enable compression. With compression, it's only slightly slower then without dedup.
I did have compression on. Good to know that in some cases dedup will perform quite well. Was that with an SSD?

My best guess is that I either ruined the configuration in some way or dedup and only dedup reacted horribly to being in a virtual machine.

ZFS is designed to have lots of horsepower and memory thrown at it.......big servers, available CPU power, lots of ECC ram. If there's going to be an SSD allocated as a cache disk, it's probably expected to be huge and enterprisey too.....

ZFS is awesome, but some features will be disappointing unless you are dealing with adequate resources.

Actually excepting DataDomain ultra expensive specialised hardware (and maybe a couple of similar enterprise solutions), all dedupe systems come with a huge performance hit. ZFS is no exception...