Hacker News new | ask | show | jobs
by kyran_adept 2089 days ago
What if you encounter an nginx bug, or a kernel bug? At some point, when you reach a low enough level you will need some deeper investigation tools and finally access to the machine to get enough data and fix the problem.
4 comments

Then that would be a sysadmin / DevOps responsibility rather than your application developers. Or at least you'd have those guys involved with the investigation.

But honestly, how often is a web applications bug due to a kernel bug?

a) How can there be a "DevOps responsibility rather than your application developers"? Isn't the whole idea of the word "DevOps" to eliminate such distinctions?

b) In my experience, the application developer is held responsible for the application's behavior in production. In the luckiest .01% of scenarios, there might be an infrastructure engineer with appropriate permissions and free time trolling the Slack support channel at the moment you report the issue. Otherwise 99.99% of the time infrastructure will not acknowledge of investigate anything complicated or subtle with just one service owner complaining. The infrastructure group, organizationally, is graded on shipping new platform features and on coarse KPIs for the performance of the platform as a whole; nobody is getting paid to investigate the weird bugs of some application team somewhere.

> Isn't the whole idea of the word "DevOps" to eliminate such distinctions?

No, it doesn't eliminate such distinctions. My view of DevOps is more about ensuring that automation is used as much as possible to meet objectives.

It's definitely not about making everyone a homogeneous developer unit that can work on every problem.

People come in all shapes and sizes, some are more competent with certain things than others, others have a lot more experience with certain things. That's aside from the whole preference thing - not everyone wants to or has an interest in managing infrastructure.

Maybe when you have a handful of developers and a small set of infrastructure, that's fine - but at a certain point you start to require more and more specialised knowledge. Yes, even when you're all-in on Cloud and using all the SAAS/PAAS products out there.

>Otherwise 99.99% of the time infrastructure will not acknowledge of investigate anything complicated or subtle with just one service owner complaining.

Yeah, that's an organisational problem from the sound of it.

Everyone having their own idea of what devops means is problematic.

I think it’s best to consider the original source which is this talk from Flickr: https://youtu.be/LdOe18KhtT4

("10+ Deploys Per Day: Dev and Ops Cooperation at Flickr").

Directly it talks about joining developers and operations into the same team- later Patrick Debois would refer to this as DevOps and a year later the first DevOps days in Ghent was organised (also by Debois).

Thanks for reminding me that someone else remembers this. It seems like it took no more than 5-6 years for everyone to simply adopt the term as a replacement for sysadmin with no change in operational practices. Coincident with that it seems most infra engineers started calling themselves SREs to distance themselves from the diluted concept.
Yep. It's more of a concept of cooperation than anything else. Which, since it's not a concrete thing, makes it harder for people to understand. But then there's specific practices that arose out of trying to drive best practices in Ops at the same time (like IaC, II, CattleVsPets, automation, etc) so now DevOps "means" a jumble of slightly related things.

We really need some new terminology.

Is that the original devops talk, which is like the origin if the devops "movement"? Very cool, did not know it had such a clear origin.
At my company, a subset of developers have ssh access, if other developers need something that requires ssh access they work with someone who has it. But unless you are a very early startup, I don't see why every developer would need ssh access.
Yeah, ever since devops became a thing they’ve placed themselves in this position where they’re somehow better than the application developers, even though until a few years ago those same developers were doing the exact same things.

I swear, sysadmins were annoying as an application developer, but devops is something else.

People with one year of actual work experience get hired as devops, and have all the privileges I would need to fix their mistakes, but I can’t, because I’m an ‘application developer’. So instead you end up teaching them how to do their job.

I’m not salty at all.

DevOps is a set of practices not a role. I do Ops, practicing DevOps, and I serve my programmers. My job is to ensure they stay happy. If they're not happy about something in the production pipeline that's on me. I work hard to make sure that I'm their Jesus Christ for all things infrastructure.

If your programmers aren't delighted with you, I'd say you the Ops person is not practicing DevOps or you have a buy-in problem to DevOps practices at an organization level.

I love this take.

My previous role had my title as "DevOps Engineer" but it always rubbed me the wrong way. I was just an Operations Engineer with a focus on making my developers' jobs easier, in any way I could. Having that as my North Star kept me honest about the work I was doing versus considering the role more like Operations Engineer v2.0.

In the Silicon Valley, at least, DevOps seemed to be (seems to be?) sort of in vogue; I think it's important to keep its core qualities of bridging Development and Operations in mind as opposed to just shifting an existing position's title in an attempt to attract talent.

Preach!!

And this should extend throughout the organization. If Architecture or Security or any other group is making your life miserable, they too should be DevOps'ing, working closely with you, caring about your frustrations that only they can fix. Sadly there are still so many silos left to break up.

Agree, a job title of “DevOps Engineer” is an organizational smell for me.

Most people with such a title are actually something like “Automation Engineers”, “Infrastructure Engineers”, “Operations Engineers”, “Site Reliability Engineers”, etc, that are involved in a DevOps “process”, “initiative”, “culture”, etc.

Ah the classic ivory tower argument where some “other class” of engineers are universally inept, but not “my class!”

You can write the same screed full of generalizations from the perspective of any job title: a devops person would lament the fresh-out-of-bootcamp “application developers” who have no idea how systems work together so write SQL queries that retrieve a million rows, one at a time. “Works on my local!”

Pretty sure GP was bristling at the reverse happening. We must keep the developers from screwing up the important computers.

Saying the emperor has no clothes is not white tower thinking,

I completely agree. Access to those things should be given to those qualified to work with them, not based on an arbitrary role designation.
I’m sorry you’ve had some bad experiences but not everyone is like that.

However it’s also misguided to assume that specialities don’t exist. You can have infrastructure guys, developers, security folk; and there will be overlap between each role but it’s impossible to be the master of each trade.

I agree that arrogance is an unpleasant trait but arrogance can take many forms, rudeness to colleagues, or over assuming ones own technical capabilities in adjacent fields.

I've yet to find many organisations that split those responsibilities up enough. This might work for 1% of orgs, wait no - 1% of tech orgs, but everyone else needs something better.
I can't vouch for your experience but as a sysadmin myself I've found 100% of the companies I've worked for has had dedicated sysadmins ;)

I'm being a little flippant here though, I am aware that developers are often asked to wear the sysadmin hat too.

Here's a funny thing: as of two days after this post was created, pairing hasn't been mentioned once in the entire thread. If this thread is any indication, maybe it's developers who have an incomplete understanding of DevOps.
That should be the exception, rather than the rule.
And it is deeply frustrating when you run up against one of these exceptions and need to wade through some bureaucracy before you can investigate further.
It would seem to me that this is the perfect time to pull in someone with more production experience. Perhaps they can use the existing tools to pull logs, or analyse it in some way. Maybe they've seen it before and already know the fix.

Giving everyone production SSH experience is, in my experience, a way to run into all sorts of weirdness, not to mention endless frustration.

In a modern automated infrastructure, that box is likely a container running on a virtual machine that's ephemeral and can (and probably will) go away at any moment based on any number reasons - maybe CD kicked off a new deployment, or maybe the load changed and the instance was selected for scale-down, or maybe our spot bid for that AZ isn't sufficient for keeping the instance around, maybe you being SSHed in and poking around impacted the health-check, and so it's being killed for not performing right.

Theres many other problems, too - lots of applications are built in some way that there's simply no other way than secrets (passwords, api tokens, keys) to reach other systems, particularly third party systems. So production boxes have production secrets, which you probably don't want to share with everyone.

Giving everyone SSH access so they can, in theory, take nginx/kernel dumps as needed tends to imply giving superuser rights, which means they can do whatever they like.

So, yes, pull in someone else - find some way to try and reproduce the problem NOT on production, if that fails, perhaps there's a way to grab enough detail or pull additional logs or network captures to identify the issue. If that fails, well okay, lets SSH in - but we need to coordinate that to ensure that instance does't go away, and doesn't impact production while you do it.

the point people are trying to make is that if you are at the scale where a kernel bug or an nginx bug is borking your app, it's not the developers job to go poking around the system for a fix. It's the devops/infra people's job. In my world, if you want to investigate an nginx bug... "docker run -it nginx:latest /bin/bash" and go for it... find the issue, reproduce it, then fix it in the pipeline and deploy again. You didn't touch production at all. If your debugging relies on being ON PRODUCTION, you don't suffer from the scale you need to be on there in the first place.
Not all bugs are sufficiently cheaply reproducible outside the environment in which they are observed. It seems silly to tie your hands behind your back when you could just inspect what the computer is doing and then fix it.
Why can't your developers be "devops people"?
I think developers can be devops also - but they are different skills you need to learn and keep up. Someone good at, say, nodejs or python data science may not be the best at CUDA build compilation on CentOS. And being good at both makes you less good at each unless you're working 18hrs a day to keep up with everything.

There is also the case of ratios. An organization probably needs more developers in specific areas than DevOps, so with dedicated DevOps you could concentrate similar work from across several teams to a dedicated DevOps team that knows that work very well.

I've already written a response to this elsewhere in the thread, but developers are not all equal.

You can't hire twenty developers that all have the same skill/inclinations, the same interests, the same experience.

That's not to say that a DevOps Engineer is some super 10x rockstar developer - no, they're going to have the same variations on skill, interests, experience, etc.

It depends on your environment, but there's so much different tech once you count the entire stack, that I don't think it's reasonable to expect any one person to be an expert on all of it, or even a lot of it.

Sure, but "not all developers can be devops engineers" doesn't necessarily imply "none of your developers should have server access".
>What if you encounter an nginx bug, or a kernel bug?

That is the responsibility of system administrators. Application developers have no business on a production machine. If your sysadmins don't have the technical skills to diagnose these problems, they are incompetent and must be replaced.

> If your sysadmins don't have the technical skills to diagnose these problems, they are incompetent and must be replaced.

The actual result of this is that the sysadmins are not replaced, and the application developers end up in an emergency conference call at 3am to tell the sysadmins which buttons to click on the production environment, since they’re not allowed access themselves.

I spent several years at a large multinational cloud provider that gave developers and QA access to production systems and customer PII. That all changed after the company was bought by SAP and operations were integrated. I am amazed that engineers think this is acceptable. It is bad business practice, compromises security, and illegal in some jurisdictions.
If developers aren’t exposed to the deficiencies in their systems, they have no incentive to reduce SRE pages and triage. Build resilient code with quality documentation and you don’t have to attend a 3am conf call.

DevOps is not a role or role segregation, it’s about aligning incentives and outcomes across functions in an org (hopefully through collaboration, tooling, and knowledge transfer).

The caveat is that if your org is fundamentally broken, none of the above applies or works and it’s all lipstick on a pig.