Hacker News new | ask | show | jobs
by caw 4919 days ago
Since not everyone here is ops, if your holiday is going to be potentially impacted by a deployment, you are fully aware of that going into the deployment. We take note of people with blacked out dates (e.g. you booked your flight before we ever started talking about this), and everyone else impacted knows what's on the docket. While the issues are sudden, everyone at least has that nagging feeling that they might get a call to action.

I agree that we should be moving to automated infrastructure testing and stuff like that. To some extent, it may be possible via puppet/chef/auto tools, however, not all infrastructure is like that. Sometimes you have to go physically move stuff at your downtime window, and you can't do redundant wiring (particularly for network). I've been bitten network outages more than anything else, particularly with partial/undetected failures.

I think we're seeing a move to the "treat infrastructure as code" future, such as cluster fileservers (Netapp 8-cluster mode, or Isilon systems). You'll be able to "seamlessly" migrate data around, and virtual interfaces without impacting production. I'm looking forward to seeing how that changes ops.