Hacker News new | ask | show | jobs
by dboreham 1294 days ago
Confused as to why they didn't just replace the bad SSDs with good ones?

Fwiw this sounds to me like what happens when you use "retail" SSDs (drives marketed for use in user laptops) underneath a high write traffic application such as a relational database. Often such drives will either wear out or will turn out to have pathological performance characteristics (they do something akin to GC eventually), or they just have firmware bugs. Use enterprise rated drives for an application like this.

4 comments

Hi, I made the decision not to replace the drives. I also wrote the article, and am the admin of Hachyderm.

So to be clear, we did try to "offline" a drive from the ZFS pool just to see if this was a viable path. The ZFS pool was set up a few years ago and has gone through a few iterations of disks. The mirrors were unbalanced. We had pairs of drives of one manufacturer/speed mirrored with pairs of drives from another manufacturer/speed. We know this configuration was wrong, again we didn't intend for our little home lab to turn into a small production service.

I think after spending a few hours trying to "offline" the disk, and then repairing the already brittle ZFS configuration to getting the database/media store back to a "really broken and slow but still technically working" state we just decided to pull the plug and move to Hetzner. Offlining the disk caused even more cascading failures and took about 30 minutes just for the software. We could have technically shut down production to try without the database running on it, but at that point we decided to just get out of the basement.

If it would have been as easy as popping a disk in/out of the R630 (like one would imagine) we would have certainly done that.

To be honest I am still very interested in performing more analysis on ZFS on a 6.0.8 Linux kernel. I am not convinced ZFS didn't have more to do with our problems than we think. I will likely do a follow up article on benchmarking the old disks with and without ZFS in the future.

zfs-2.1.4-1 zfs-kmod-2.1.6-1 6.0.8-arch1-1

> We had pairs of drives of one manufacturer/speed mirrored with pairs of drives from another manufacturer/speed.

The different speed is an issue, but I always recommend mixing pairs so that you don’t end up like me, when all spinning metal of the same RAID-5 array failed in a short period. Wasn’t a great day.

Lucky me I had a contingency plan.

Throw ZFS away, put X drives, make RAID10+LVM with X-1 drives (linux supports odd numbers in RAID10), never think about it again. It's simple to setup, simple to debug, and you don't need ZFS expert for something as simple as disk replacement. In cases like what happened there is --write-mostly option that will tell linux raid to prefer other disks for reads so yo can see whether unloading the drive changes anything. Maybe RAID6 if you're not screaming for performance but want some more space.

Focus your efforts on making robust backups instead. You don't want to be that only guy in org who knows how to do ZFS things when it breaks.

We're running few racks of servers, ZFS is delegated to big boxes of spinning rust where its benefits (deduplication/compression) are used well, but on a bunch of SSDs it is just overkill.

Then you will have same problems but now you can bother manufacturer about it!

Also unless there is something horribly wrong about how often data is written, that SSD should run for ages.

We ran (for a test) consumer SSDs in busy ES cluster and they still lasted like 2 years just fine

The whole setup was a bit of overcomplicated too. RAID10 with 5+1 or 7+1 (yes Linux can do 7 drive RAID10) with hotspare woud've been entirely fine, easier, and most likely faster. You need backups anyway so ZFS doesn't give you much here, just extra CPU usage

Either way, monitoring wait per drive (easy way is to just plug collectd [1] into your monitoring stack, it is light and can monitor A TON of different metrics)

* [1]https://collectd.org/

It costs money.

Remember this isn't a company: its hobbyist/enthusiasts putting their own resources into something or running with donations when available. There's no venture capital to absorb operating losses here. Remember the old "storm/norm/conform/perform" analogy. We are still very much pushing along into norm territory, and articles like this will help establish a conform phase ... but it will take time.

> Confused as to why they didn't just replace the bad SSDs with good ones?

Probably because they wanted to migrate to hetzner anyway and took the chance to do it now instead of later.

But I do agree that it would have been probably a better idea.