Hacker News new | ask | show | jobs
by jammycakes 1385 days ago
In a previous job, I joined a team that was supposed to be introducing DevOps to the organisation.

It started out well -- we spent a few months hacking with Terraform, Docker, Vagrant, Kubernetes, and related technologies to implement an infrastructure-as-code approach -- automating the process of provisioning and decommissioning servers, and building a setup where development teams could deploy updates with a simple git push.

Unfortunately it all went downhill fairly rapidly. We ended up spending the majority of our time manually applying security patches to a bunch of snowflake servers that we'd lifted-and-shifted from another hosting provider to AWS, and fielding support requests from the development teams. Within a year, we were being told in no uncertain terms by our project manager that we were an operations team, not a development team.

It felt like a complete bait-and-switch. Within two years, I had left the organisation in question and moved on to a new job elsewhere doing actual development again. Last I heard, the entire team had been disbanded.

It sounds like the author of this article must have had a very similar experience. I wonder just how common it is. It seems that in many places, "DevOps" is all Ops and no Dev.

6 comments

> It seems that in many places, "DevOps" is all Ops and no Dev.

This was definitely my experience at my last couple of jobs. At my last one they "scaled out their DevOps team" by hiring tons of juniors with next to no software development background. And then they "empowered" teams by assigning the juniors to each dev group. As a result, we ended up having to train them how to do their core jobs, which... went about as well as you'd think.

Eventually, there was an attempt to shift everyone to kubernetes. They had a special "DevOps" team build a layer on top of it to handle the non-kubernetes aspects of deployment as well, and somehow manage them together using Helm. If you're wondering "what the hell does that mean", well, it turned out nobody really knew. These "DevOps" engineers didn't really seem to understand kubernetes core concepts, and just ended up hacking away with some scripts on top of terraform delivered via Helm until something got configured. It was incredibly slow to deliver, hard to use, and I just stayed away from it until some exec threw down the mandates. (And then everyone started quitting because it was an absolute disaster.)

Ultimately, these are really stories about bad management, not really anything to do with DevOps. But that's how these things roll - some new hot concept comes to town, and bad managers try to adopt the term, without really understanding it.

What you just described is the exact same reason I started working on https://stacktape.com 3 years ago.

When doing the market research, I talked to ~150-200 companies (mostly SMBs). Everyone was trying to "do DevOps". But the complexity of running a Kubernetes cluster (or a custom AWS setup using ECS) is just overwhelming for most of the teams.

In most cases, the DevOps/platform team requires atleast 2-3 experienced people that have successfully done this before.

Considering how few experienced DevOps people with such kind of experience are currently available on the market, it's no surprise that only the "coolest" companies around get to hire these people. These successful companies then write blogposts about how successful they were.

And the circle starts all over again. Less successful companies follow them and (in most cases) fail.

Most of these companies don't admit it, or don't admit it soon enough. They also don't write blogposts about their failures.

From my experience and research, roughly 70-80% of companies fail to deliver the expected results (or deliver them with order of magnitue more effort than initially expected). Yet 90-95% of the content we get to read about these topics is overwhelmingly positive.

PS.: If you don't have an A-tier DevOps teams, check out https://stacktape.com. I promise it will make your life easier.

It's not always bad managers or management. The fancy new hotness gets pushed onto them by smooth-talking evangelists. For every criticism or question they have a snazzy little quip and retort that makes you look like an idiot for not knowing "the obviousness" of your errors and how this new fad/tech/framework/methodology solves it. And if that retort doesn't work, they just tell you "it's standard in the industry, dunno what you want me to tell you".

And from the outside, this all just looks like resume padding and job-security. Devops is the new priesthood, subscribe or be reduced to irrelevance by config warriors.

The fancy new hotness gets pushed onto them by smooth-talking evangelists.

Gets pushed onto them top-down. This means there is no real competition, no real empiricism, no real comparative merit involved in the switch.

How should never be imposed top-down. It's only goals and how success is measured which should be top-down. The classic example of this was when Jeff Bezos mandated that all Amazon software systems should be accessible by APIs over the internal network, or else one would be fired.

Whenever I've seen How imposed top down, I've only seen the lower level managers talk about how they could put off and passive aggressively stymie the initiative.

Wow sounds like exactly what happened at a $previousCompany-1 and exactly what prompted me to leave.
> We ended up spending the majority of our time … fielding support requests from the development teams

This has been my experience at 3 different small-medium companies now. A too small DevOps team suddenly is in the critical path for even the most trivial software task, then engineering productivity grinds to a halt. I think a much better pattern would be to enable dev teams to self serve. Set up the required infrastructure and guard rails, then let teams handle their own deployments and infrastructure. Give people what they need to do it themselves instead of having to open a support ticket for everything.

> I think a much better pattern would be to enable dev teams to self serve. Set up the required infrastructure and guard rails, then let teams handle their own deployments and infrastructure.

I think that's how DevOps is actually supposed to be done in the first place. You view Ops -- and the code used to manage and support it -- as a product, and get a specialised team of experienced Devs (and architects) to build it.

Once you've got the basic infrastructure and architecture in place, you then train up the individual development teams to customise it, extend it and troubleshoot it as they need to. In much the same way as they do with any other software product.

My experience is what inevitably happens is the ops team goes an writes a layer on top of Kubernetes, and now instead of dealing with Kubernetes you're dealing with a half baked poorly written abstraction with zero documentation and no StackOverflow on top Kubernetes. So you need to become an expert in both.

Most organizations don't have the resources, mindset, or skills to support a software library product and should only do it as a last resort.

Most developers won’t know how to do it without footgunning themselves constantly is the problem.

If the dev ops team is staffed enough to develop integrations that won’t allow that AND won’t get in the way, and then train folks and ‘keep the line’ enough to stop the scope creep - they’re probably not at a shitty small/medium sized shop.

Here is how some companies "do DevOps":

  1. An operations team with a different name.
  2. A platform team with a different name.
  3. A development team with a different name.
  4. A "CI/CD team".
  5. A role (ex. "dev who automates ops", "ops who codes", "support specialist who codes", all three in one).
  6. A chart that the delivery manager maintains.
Here is what DevOps should actually be:

  1. Delivering rapidly and consistently with extremely high levels of confidence.
  2. The right people address problems correctly, immediately, the first time, and fix it so it doesn't happen again.
  3. That's it.
Come on, this is such a joke.

Increased velocity is what business get promised, yes. It’s what you want.

The reality is that you can’t just magically make that happen.

Draw a line. Now draw the rest of the owl! Easy~

The problem has never been understanding what the desired state is… it’s always been that getting from the current state to the desired state is very very hard, and continually road blocked by:

- people who don’t want to learn new skills

- people who don’t want to seed control they currently have (over process and product)

- a lack of clarity on who is responsible for what systems

Devops is a load of hype.

There’s never been any reliable process to move to rapid delivery from it.

Yes, some teams have managed to get something that works, and there are a lot of tools and a lot of training which has resulted in over all better SRE processes.

…but by and large, that’s because of using better tools (eg. infrastructure as code) not because of devops.

When was the last time you had a “devops” guy you got to do something for you?

Right. That’s ops.

When was the last time something broke and the people responsible for making sure it never happened again were the “platform team” or an SRE?

Ops again.

You had all that before you started your devops journey.

It’s all just ops, with slightly better tooling, less outages, higher reliability and absolutely zero increase in product velocity.

Devops was the promise that by bridging operations and development you could get high reliability and faster iteration by having teams that could “cut through” the red tape and get things done.

That appears, largely, not to work.

Yes, developers that understand systems tend to build more reliable software.

No, it is not faster to do it that way, and the transition will be painful, and, because businesses mostly care about iteration speed more than reliability, even a technical success, it often a fails to deliver on its business value.

It has worked well for organizations that embraced it. It hasn't worked well for organizations that paid it lip service. That's the way of the world. There is a path laid out by DevOps methods and dozens of ways to get there, but the path doesn't walk itself.
Note that this admits strategies where nothing of consequence is ever delivered (but each deploy has some quantifiable and measured churn), and the people that break stuff get credit for fixing the stuff they broke.

I've watched this particular breed of organizational cancer destroy many companies and products.

The end game is that people creating useless, but highly visible churn get promoted, as do the ones that repeatedly break stuff. Even if that doesn't happen, the engineers that want to build stuff inevitability flee.

That's where value chain management comes in. If you can't show business value being delivered, there's no point to any of it.

It's also worth acknowledging when you don't need DevOps. Banks, for example, shouldn't need it. Their entire purpose is to be slow and reliable. Most of their money is literally just old people keeping lots of money in one place and not touching it. They shouldn't need to churn on features and ship constantly.

Sounds like the organization went full cargo cult with pets (lift and shift) instead of a devops-ready environment.
I've not seen one place that has escaped this problem though.

Something is either old and nobody feels like fixing it or something doesn't fit into the current constraints of the platform featureset. So they build something on their own, probably undocumented and without consulting the supposed DevOps team, but still using parts of the platform that have not been declared an API. But when you want to exercise the freedom you supposedly built with your platform, all these edge cases fall back on you and inhibit change. "We can't change the ingress controller, we rely on this implicit behavior of it" "You can't change how database credentials are provisioned, we pulled these from the Cloud SQL console and comitted them to git". And facilitating any change as soon as you can cover the use-case is a fight with stakeholders and POs that I usually have no nerve for. "Why do we need to do anything??? It works?". And then you get blamed when it breaks. I love this job.

> We ended up spending the majority of our time manually applying security patches to a bunch of snowflake servers

Here is where it went wrong. The app should be re-deployed in a proper way, full infra as a code. No patches, new AMI instead.

Isn't it apparent in the name? MarketOps, ServerOps, PaymentOps - all Operations