Hacker News new | ask | show | jobs
by mike_hearn 667 days ago
I left over ten years ago and it's hard to understand that perspective. Back when I was an SRE (~2006 to 2009) there were only one or two monitoring systems (which didn't overlap, so you could argue there was one) and a handful of config languages. Compared to anywhere else Google had military levels of discipline and order.

> Deployments were infrequent, unreliable, and sometimes even done from a dev's machine.

Deployments were weekly and done from a dev machine because that way someone was watching it and could intervene in case of unexpected problems. Some teams didn't do that and tried to automate rollouts completely. I could always tell which products weren't doing enough manual work because I'd encounter obviously broken features live in production, do a bit of digging and discover end user complaints had been piling up in the support forums for months. But nobody was reading them, and the metrics didn't show any problem, and changes flowed into prod so the team just ... didn't realize their product wasn't working. There's no substitute for trying stuff out for yourself. I encounter clearly broken software that never seems to get fixed way too often these days and I'm sure it's partly because the teams in question don't use their own product much and don't even realize anything is wrong.

1 comments

I think the state of the art has moved on quite a way from this. I understand the point of view that someone should be watching a release, but the alternative is not "no one watching a release", but more that binary releases should be no-ops. With feature flagging the binary release should do nothing different so that no one watching it is not a problem.

Additionally, rolling out from a dev machine brings so many risks – security, reproducibility, human error, and so on.

I'm glad this is not the way things work anymore, and for the most part things are more reliable as a result.

Well, to be clear "rollout from a dev machine" meant just that the rollout controller ran locally, the actual software being released was built by a release pipeline, placed into signed packages and so on. So it was all auditable. The people doing the rollouts were those who had production administrator access anyway for on-call troubleshooting and debugging and permissions were enforced, so there was no security impact. And the same process was used for flag flips so just putting everything behind flags didn't make much difference.

It doesn't sound like what's done now is a whole lot different tbh.