| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 0n34n7 1746 days ago

Agreed. Good application code often contains edge case handling, build time checks, unit tests and defensive flows that handle the unexpected so that users don't wake you up at night. Why can Ops not do the same? Why can Dockerfiles / Orchestrators / CI / playbooks not also implement sanity checks on deployments?

"Ooops... deployment failed. While deploying your artifact we found the following:

- Nothing is listening on the nominated port

- Your deployment is utilizing 100% CPU while idling

- We detected an abnormal volume of write operations to the mount

Please fix these issues and re-trigger the pipeline at your earliest convenience.

Regards, Ops."

3 comments

clipradiowallet 1746 days ago

> - Nothing is listening on the nominated port

Now that just shouldn't happen... ie, we(ops) aren't going to deploy something that doesn't come with healthcheck(s). The healthcheck never passing(port isn't listening) is going to stop the deployment from ever completing. Ops job is to push back on developers if they try to hand us something like this to build a pipeline for. In my company, to hand Ops the name of a repo and say "build a pipeline"...there are a lot of requirements, and the biggest one is a list of SLAs. That list of SLAs is how we build monitoring for your application, and one of those should always be a list of port(s) and protocol(s) that are exposed; we build monitors against those.

link

jameshart 1746 days ago

I've been struggling a little in this thread trying to understand what these 'ops' teams as described are doing.

Being a human kubernetes seems to be the crux of it.

In my company, to hand 'ops' (kubernetes) the name of a repo and say 'build a pipeline', it's basically a matter of committing a gitlab-ci.yaml file.

link

clipradiowallet 1745 days ago

It varies with the company you are at... but building pipelines is just one of the ops tasks typically. Ie, in your gitlab-ci.yaml example, ops would have given you the template (assuming you were going to be the one committing it) that your pipeline had to follow - so it was uniform with the thousands of other pipelines in the company. It's unusual at most of the shops I've been at for devs to ever build their own deployment pipelines. That might be OK at a smaller shop, but once you have 100+ deployments of any type, the devs have lacked ability to keep it all uniform at scale.

A better way to put it... most of my colleagues in an Ops roles already did development for 10-15 years, and moved on to developing the tools to deploy other people's products. Additionally, kubernetes isn't everywhere - they also build pipelines that produce AMI's, GCP images, and write the terraform/cloudformation/HEAT/etc to deploy those things. If you wonder "who automated blue/green deployments?", that's your ops team.

Also, in your example of "human kubernetes" - ops builds those clusters, and monitors those node pools. If you half less than a dozen clusters, or less than 100 nodes among the pools - you might not even have an ops team.

link

jameshart 1745 days ago

What’s the value of uniform pipelines across thousands of projects? You’ve now got thousands of teams who don’t really understand how their stuff is deployed because someone handed that to them on a platter, and everyone is subjected to the uniform constraints and complexities of the shared solution whether they need to be or not…

What seem like efficiencies can rapidly become barriers.

It’s like any case where two systems share a requirement - you can factor it out into a shared library or you can duplicate the code; in the case where the common requirement is only ‘coincidental’ not ‘instrumental’, you are better off duplicating the code so that the two systems can evolve independently and not take a coupling to a shared dependency.

The same applies to infrastructure. Sure, you’ve got a dozen clusters, and it seems efficient to have one team set up and operate all of them - but are you sure the efficiency of one big team is better than twelve much smaller teams, closer to their dev orgs, who each run one cluster more tightly suited to that org’s needs?

How far down can you push that decentralization?

With smaller and smaller units of cloud compute and storage being available as services, the answer is increasingly ‘all the way to each individual application’.

link

jensensbutton 1746 days ago

> Why can Ops not do the same? Why can Dockerfiles / Orchestrators / CI / playbooks not also implement sanity checks on deployments?

All of those things were written by developers.

link

rswail 1741 days ago

Or you adopt SRE and the necessary guard rails, guidance, non functional requirements and gates so that PO/PMs cannot overrule them.

link

seniorThrowaway 1746 days ago

"Oh those are normal errors" - Every developer I've ever worked with

link