Hacker News new | ask | show | jobs
by vilda 5293 days ago
This sounds like you had a really bad implementation. Proper file server of this small size would not fail for several hours per month.
2 comments

Absolutely true. The RAID controller would randomly lose drives and the driver for it would randomly cause kernel panics. We tried different firmwares and different kernels and made some progress, but never really got it stable under load.

However, that's the risk you run with single points of failure. Put all your data on one big box, and any failure in your RAID hardware, RAID firmware, RAID drivers, network drivers, kernel, RAM, OS, et cetera will take down the big box and thus take down anything relying on it.

The lesson I learned wasn't to make a super-robust single system, it was to have enough redundancy to stay up when something inevitably fails.

I agree, that just should not happen unless the server was a complete lemon or badly assembled.