|
|
|
|
|
by perlgeek
2341 days ago
|
|
As a postmortem, this does not inspire confidence. It's a very technical piece, but doesn't even try to take a customer's perspective. If you want to learn from such an outage, you have to do a fault analysis that leads to parameters you can can control. Sure, there can be faulty hardware and software, but you are the ones selecting and running and monitoring them. If recovery takes ages, you might want to practice recovery and improve your tooling. And so on. Blaming ZFS and faulty hardware and old software all cries "we didn't do anything wrong", so no improvements in sight. |
|