|
|
|
|
|
by 8organicbits
1293 days ago
|
|
> My current startups all run from the same big (but 10yr old) hardware I'd love to hear more about the setup. I suspect something as simple as disk failure would cause outage, although I suppose you can detect a soon to fail disk via SMART and resolve that with scheduled maintenance/downtime. But what about power supply failure? Do you keep redundant backup parts on hand? There's definitely something nice about having a hardware error on a cloud VM result in that VM cycling out to new hardware automatically. In contrast, something as simple as buying a new off the shelf PSU feels like a ~1 hour downtime event (longer of you don't have purchase card authority, it's night time, you need to order online, etc.). |
|
The system I referenced currently has x8 4TB NVMe drives on ZFS raidz2 and x8 16TB rust drives also on raidz2. I use sanoid to snapshot like crazy (down to 15 minutes) and then syncoid to push snapshots and then some to the spinners ZFS array. Plus zfs send remote to offsite.
Modern switching power supplies are incredibly reliable. Then do proper “half load” dual power supplies from dual conditioned power feeds with UPS and generator. Losing power to a machine almost implies an extinction level event or complete incompetence on the part of a data center operator.