Hacker News new | ask | show | jobs
by tripleo1 1186 days ago
ZFS is fun but it locks up for unexplained reasons sometimes
2 comments

I don't know if it's the same issue you have in mind, but I can 100% reproduce ZFS hanging and thinking a disk is unusable.

Steps to reproduce:

* Get an external HDD. I literally bought a new, different external HDD because I thought the problem was the old one (spoiler: nope, the problem still could be reproduced 100% on the new disk).

* Create a zpool for the whole disk.

* Create a dataset.

* Try to rsync several hundreds of GB to the ZFS dataset.

* Wait for a minute or two.

* Notice how it stops transferring data, and gives a weird error complaining the disk is unhealthy, faulty, or something (I don't remember the exact terms).

No amount of `zpool clear` or `zpool scrub` will fix it. I gave up and just formatted it as ext4 like all my other backup disks.

My use case for this was having this external HDD as a backup. The plan was to format this as ZFS, copy data from all my other external HDDs to this one, format the other external HDDs with ZFS, and then start rotating between them.

---

Another way to reproduce this is with torrents. When I downloaded torrents directly to an external HDD, it also hanged and got some errors, but in those cases it could be fixed with a `zpool clear` and scrubs, so it wasn't that bad (it wasn't literally unrecoverable, like the case I mention before).

---

So this leads me to believe there's something weird between ZFS, external HDDs, and trying to write too fast.

The whole point was to be able to run `zpool scrub` on those external HDDs. But like I said, I gave up on that for now. So the current plan is to try to build a NAS and do the same attempt, but with internal HDDs.

maybe failure in the connection or controller. But most likely the external disk was using smr (shingled magnetic recording), which shouldn't be mixed with zfs. there are different types of smr and some zfs-issues with smr-disks have been fixed in the past. servethehome.com has a detailed article (benchmarks) why those two technologies should not be mixed.
Any more info?

That sounds like a problem specific to the setup you were using. (?)

Saying that because if it was something that commonly happened, then either a) it would have been fixed, or b) people would have stopped using it. :)

If the specific setup is an external HDD (or maybe a very slow disk, and trying to write too fast to it), then I can make sense of parent's comment.

Like I mentioned in my sibling comment, I can 100% reproduce something that sounds like what parent mentioned (most recent attempt was like 1-2 months ago); but for my specific case, I can see how ZFS on an external HDD might not be that common.

I suggest playing with the ZFS tunings to add a write throttle. My hunch is your disk's buffer is filling and then blocking.
I think that there was an electrical problem (mismatched chargers on a laptop) in the first instance of mine.

Then there's encrypted datasets being very slow, especially when copying between two pools.

Then there's having two pools on the same disk corrupting each other. Though in this case the data might have been recoverable using some recovery tool (like a "modified zdb").

Also, I read on some website somewhere (maybe zfsonlinux.org) that USB devices probably shouldn't be used with ZFS.

Honestly I "wish" some bunch of crazy filesystem people would just clean room ZFS...

That sounds like a good idea, you just gave me new keywords to search for my next attempt.

Thanks!

Listen, people have to start using it to stop using it :=)