Hacker News new | ask | show | jobs
by arupchak 4053 days ago
We are lucky that our customers care deeply about our availability, so feature work/improving the infrastructure is the same thing most of the time.

As for fostering the culture, part of it is that we perform post-mortems on all outages and measure the customer impact. Many of our problems do not have textbook answers, so the post-mortems become a cave diving exercise to find the answer. We have to build our own answers, and sometimes, like this one, build our own workarounds.