Hacker News new | ask | show | jobs
by porpoisemonkey 2463 days ago
I recently transferred as a developer to a development operations role at a medium-sized company. This article exactly describes my experience - the main focus of the ops team seems to be on build, deployment and monitoring technologies focused on a migration towards containerization running on AWS.

Code and tooling is built on a "what works" basis and no particular attention seems to be paid to the overall design of the software or testing (likely due to a real or perceived lack of time). On-call rotations (and how to eliminate work for them) is the hot button discussion topic.

The #1 question I get asked by the developers is "Did you really want to transfer from development to ops?" which me think that a lot of developers look down on operations roles and see it as a demotion. I find that quite odd given that 1) ops keeps the product up and the money coming in with relatively little headcount and 2) most of the people working on our ops teams have a formal education in software engineering.

8 comments

I think it’s because ops folk cobble together scripts and tools only to scratch the current itch, rather than think through the whole problem and design and write software to solve it. Testability being one of the biggest sins I’ve seen in ops, tools like Ansible encourage changing systems at run time with complex logic tied to specific deployments for example, they don’t use IDEs so there’s no way to jump through the yaml files, get inline help, know which playbook is run when (it’s like a program with 100s of main()), no integrated debugger. It’s like the previous 50 years of computer science never happened and they’re starting back in the 1970s.
A little thought exercise: your comment from the POV of a ops person explaining why they look down on devs.

Dev folks often take weeks to make even a small change. It doesn't matter how urgent the need is or how badly the business needs some kind of workaround, they over-complicate every problem, and ops has to spend more time in meetings planning their next project than it would have taken for us to put a working fix to be in place. Maintainability is one of the biggest sins I've seen in devs, tools like npm encourage devs to use an overly-complex chain of 3rd-party dependencies for even small projects, and they rarely ever update their dependencies after initial deployment, so if a security update to an underlying package is required or if the underlying has to be deployed to a new underlying system or container changes, things can break easily. They depend on IDEs for everything, and can't even use commonly installed system tools like awk or sed when, or use netstat to debug simple networking issues. It's like the 1970s never even happened and they never even learned what an operating system is.

Just like developers cobble together apps that are barely operable. Hard coding IP addresses, opening connections with no timeouts, and lacking basic understanding what the difference is between DNS and HTTP.
at one point, I assumed hard coding IP addresses and paths to /home/user/whatever/ was a just a web dev thing

spoilers: it is not just a web dev thing

Great post! You touched on something there thats been bugging me last few years, testing is non existent in devops/sre world so developers who hate testing and do not understand it is core to good engineering seem to gravitate towards devops roles as they can hack their way thru their day all while creating tons of tech debt.
Testing is basic in devops/SRE. How are you going to ensure reliability if you cannot test the shit out of it?

The issue is there’s no formal tool or practices for testing in ops. DSL languages are limited forcing you to use several for different kind of scenarios. At the end you need to rely in a real programming language to parse different format files and ensuring your variables are correct. I think developing software for operations is exciting which is going to mature with time. Kubernetes(and its CRD) is a step forward

Chef had great support for automated testing. We also generally did a lot of testing on other tools.

Sure, things like ansible lack proper testing support, but that doesn't mean that all of the profession doesn't test.

> The issue is there’s no formal tool or practices for testing in ops.

There is though, it’s called writing normal software. Kubernetes can be framework to build on, with actual developers who design from high level logic down through to the implementation.

> The #1 question I get asked by the developers is "Did you really want to transfer from development to ops?"

I’m not sure what job listings they’re looking at, but as an ops person (kubernetes) I’m interviewing for jobs with close to half a million a year in total comp.

I think that people perceive being on call and having to work with messy legacy infrastructure as a restriction on their personal freedom and an anti-perk. We also don't compensate well for oncall shifts so that might be part of the reason that people see it as a lesser form of work and not as a special status that's only given to the best engineers. At other organizations it may be different... I hear FAANG companies compensate their on call shifts quite well (typically a percentage of base salary banded by SLA)
Second time I'm hearing (what's for me) a mile-high comp number for kube jobs and I'm now really tempted.

Working as a data scientist - software engineer in a midsize company, I constantly battle amateur ops folk and "backend" fullstack jockeys from introducing kubernetes into a saas product I mostly created by myself (makes money but there are maybe ten users per hour tops, why would I need kubernetes for that?).

My org has seen multi-day downtimes for the entire eng team workflow because the eks cluster went down and they couldn't figure out how. We have four people dedicated in the infra team for this! I'm not really an ops person but I see where the failings of these folks and stacks are, and feel like I might be able to learn to be half-decent if I put the time. What advice would you give ?

Kubernetes is a full time job. If you want to capture 90% of the benefits of containerization without wasting too much time on a complex solution then simply restrict yourself to only use docker with bash scripts and maybe a load balancer if you really want to have HA.

I've found nomad to have a lower complexity than kubernetes but if you cannot directly integrate service discovery into your application then you will need to use a service mesh which is an all or nothing thing but using something like traefik's support for consul means you will have to use regular service discovery alongside the service mesh. It's not a huge burden but there should be a better way.

> why would I need kubernetes for that?

They need k8s for their, otherwise bleak CV. Its all hype and shiny today.

> (makes money but there are maybe ten users per hour tops, why would I need kubernetes for that?).

The idea is that you move your low traffic app onto servers with other low traffic apps and save money. If they’re just moving your app to a kubernetes cluster by itself it’s probably not worth it.

Whoa where are you seeing these job listings? I don't know if I've ever come across a public listing for a technical role that paid so much.
They don’t really exist. I’m mid career management and hire these types of roles. You go above $200k and you’re just wasting money. There may be a few random openings above this but nothing sustainable once folks realize the talent curve.
They don’t list the salary but the recruiters will tell you — it’s Silicon Valley companies mostly.
Can you share more? I am aware that this comp exists (just see levels.fyi) but haven’t found it for Kubernetes focused roles yet.

Are you talking to FAANG’s? Second tiers like Uber, Square, etc? Other?

Both.
After having done a rotation in the dev ops world, I find their lack of design and architecture really disturbing. Services are implemented without edge cases considered. Their tools are some hacked together monster using flavor of the month and some old tech stacks. Unit and system tests aren't written, or if they are, they don't test any edge cases whatsoever. It's hard to test anything locally and most testing is done in test or even prod (Seriously). Documentation is almost non-existent and the implementations of services such as Chef don't follow chef documentation or best practices. Logging is usually non-descriptive and applications written by the ops teams will fail with cryptic errors which they know by heart but make no sense.

"Oh that nil pointer exception on line 53? Yeah that means you don't have IAM permissions".

In short, the ops world lacks what most software engineers would consider basic engineering practices. It really feels like that entire world is just a hackathon project. I know there are some very talented engineers on ops teams. It's just the impression I got from 6ish month rotation.

Rant over, this just touched a nerve.

Strange, I'm burdened with a development team that behaves the same, except they usually don't know what the error messages that their own software produces mean, especially not "by heart".

But since the CEO has a development background, always sides with them and in general wants them to build "features", not fix their shit, they never have to take responsibility for anything.

> medium-sized company

this is mind boggling! For med size company to have a separate ops department. I can't imagine that responsibility for MY code in production (and delivery of the said code to production) lies on a different team/department that are possibly not even on the same floor/building I'm at.

Separate OPS department is where dev's happiness and job satisfaction go to die.

I'd imagine this outdated setup in some government agency but not in a med-size company.

> I can't imagine that responsibility for MY code in production (and delivery of the said code to production) lies on a different team/department that are possibly not even on the same floor/building I'm at.

I recently did some interviews with our devs to find out what features we could add to our platform that would provide them with value. The result was interesting and I found that people typically fall into one of two camps: 1) I want to know everything about my deployed service and have tools that alert me and allow me to intervene, or 2) I don't want to know anything about the deployed service runtime and expect operations to handle my issues and alert me when there's a problem.

It sounds like you might be in the former. =)

The initial intention (of DevOps) was to eliminate these discrepancies by eliminating Ops department all together. DevOps doesn't necessarily mean devs doing ops, but it means devs and people curios about ops sit together, as one team, as one department, right next to each other. This setup improves practically every aspect of the product development, support, and delivery, as well as collaboration, communication, and response times.

Initial intention aside, it feels that there is a general consensus that VAST majority of the companies that have `devOps` in the job description somewhere are cargo culting and have no clue what they are doing.

If you slap `Dev` prefix to your Ops department your job postings will look "trendy" but nothing else will actually change.

DevOps is a culture, not a team or department.

Sounds like how companies or teams cargo-cult “Agile” and have no idea what it is or why.
You may deploy your application package (however that is packaged up), but what happens when a hard drive starts to die? It may not _die_, it just may have elevated write latency. What about a RAID controller firmware having issues above a certain IOPS threshold? What about a critical kernel security patch that has to go out, and your application runs on 1,000 servers?

None of those things are related to your code directly, but may interact with it at some level. At some point, you get so far removed from the work on your actual application that it makes sense to move that to another 'Infrastructure' group.

What do you consider a medium-sized company? Or a department even? I'm having trouble imagining a department being in an entirely different building. The companies I've worked for who I considered medium-sized had hundreds of employees and if they had multiple different offices, each department had a physical presence at each office.
Cargo cult doesn't care about the number of employees in a company.
Operations are always looked down, nothing new here.However it's funny when I see devops jobs with much higher salaries than the ones for dev roles. I manage an ops team in a non tech company- while sales get much more attention, we have better office environment,better equipment,salaries are better and there's no quartermaster using his whip on the deck...
> "Did you really want to transfer from development to ops?" which me think that a lot of developers look down on operations roles and see it as a demotion.

I would certainly see it as a departure from what I want to do - design, build and deliver systems. I'm not really in it for the continuity operations.

Then, and I don't know of a nice way to say this, the systems you design, build, and deliver are going to be unreliable and flawed.

Owning (as in caring about and as in business responsibility) the reliability and operation of systems you build, at least for a while after they stabilize, is critical if you want to produce quality products. After all, operational flakiness is a UX issue.

> Then, and I don't know of a nice way to say this, the systems you design, build, and deliver are going to be unreliable and flawed.

Sorry but that's utter bollocks. I'm not interested in being in ops, therefore my software is shit?

> Owning (as in caring about and as in business responsibility) the reliability and operation of systems you build ... is critical.

But that's not being in ops. That's about taking an interest in the running system.

You've just jumped on me saying I'm not interested in having a role in ops because I like to build software, and run off to some weird unsupported conclusion that I just write code and abandon it. I don't need to have a business responsibility for the running system in order to support it and be responsive to the ongoing needs of those that do.

> 2) most of the people working on our ops teams have a formal education in software engineering.

You are so lucky. This is incredibly rare. I'm also lucky that my team is the same way. But most teams are not.

Hey dude was wondering how this comment made you feel:

> Cloud monitoring is a saturated market.

It's like yeah, it's saturated but it's saturated because the infrastructure is we're hosting it on is still and forever changing. Take some serverless CloudFormation in AWS; there was no good solution for application monitoring until someone specifically started solving for it because no one in their right mind was going to use CloudWatch and none of the other existing monitoring solutions/tools could fit the bill either unless they started from scratch and solved for that specific new infrastructure.

<Insert shameless Epsagon plug here />

The Cloud monitoring market might seem saturated but that's because there is no "silver bullet" solution given how much infrastructure has been and continues to change.

CloudWatch works fine and Epsagon's sales tactics are dishonest and shitty in addition to being spammy--I'm still waiting to hear back from a "Cassie" who I don't think actually exists as to where they sourced my email from for their cold-email marketing blasts.

I'll never do business with a company that gross.