Hacker News new | ask | show | jobs
by devjab 228 days ago
We're a much smaller scale company and the cost we lose on these things is insignificant compared to what's in this story. Yesterday I was improving the process for creating databases in our azure and I stumbled upon a subscription which was running 7 mssql servers for 12 databases. These weren't elastic and they were each paying a license that we don't have to pay because we qualify for the base cost through our contract with our microsoft partner. This company has some of the thightest control over their cloud infrastructure out of any organisation I've worked with.

This is anecdotal, but if my experiences aren't unique then there is a lot of lack of reasonable in DevOps.

1 comments

Isn't that mostly down to the fact the vast majority of devs explicitly don't want to do anything wrt Ops?

DevOps has - ever since it's originally well meaning inception (by Netflix iirc?) - been implemented across our industry as an effective cost cutting measure, forcing devs that didn't see it as their job to also handle it.

Which consequently means they're not interfacing with it whatsoever. They do as little as they can get away with, which inevitably means things are being done with borderline malicious compliance... Or just complete incompetence.

I'm not even sure I'd blame these devs in particular. The devs just saw it as a quick bonus generator for the MBA in charge of this rebranding while offloading more responsibilities in their shoulders.

DevOps made total sense in the work culture where this concept was conceived - Netflix was well known at that point to only ever employ senior Devs. However, in the context of the average 9-5 dev, which often knows a lot less then even some enthusiastic Jrs... Let's just say that it's incredibly dicey wherever it's successful in practice.

I politely disagree. I spent maybe 8 hours over a week rightsizing a handful of heavy deployments from a previous team and reduced their peak resource usage by implementing better scaling policies. Before the new scaling policy the service would scale out and new pods would remain idle and ultimately get terminated without ever responding to a request quite frequently.

The service dashboards already existed, all I had to do was a bit of load testing and read the graphs.

It's not too much extra work to make sure you're scaling efficiently.

You disagree but then cite another example of low hanging fruits that nobody took action on until you came along?

Did you accidentally respond to the wrong comment? Because if anything you're giving another example of "most devs not wanting to interface with ops, hence letting it slide until someone bothers to pick up their slack"...

The first time my director asked me if I'd ever heard of DevOps, I said, "Sure, doing two jobs for one paycheck." I'm a software developer, buddy. I write the programs. Leave me out of running them.
> Leave me out of running them.

This is how customers end up with too-expensive Rube Goldberg machines.

You have to take some interest in how your code will run in production, even if you don't personally "operate" it.

Here's the extent of my interest: I take my understanding of your use case and specifications, then I write source code that tries to generate as few instructions to suit your needs as possible while still being comprehensible to the next maintainer.

The app should write records to a database? Fine. Here's where you configure the connection. The app in production is slow because the database server is weak? Not my problem, talk to your DBA.

The app should expose an HTTP endpoint for liveness probes? Fine. It's served from the path you specified. Your reused it for an external outage check, and that's reporting the service is down because the route timed out due to your ops team screwing up the reverse proxy? Literally not my problem, I could not care less.

Allow me to politely pick apart the "Not my problem, talk to your DBA" comment from the perspective of someone who's worn every IT hat there is.

Okay, so, what is the DBA to do? Double the server capacity to "see if that helps"?

It didn't, and now the opex of the single most expensive cloud server is 2x what it was and is starting to dwarf everything else... combined.

Maybe it's "just" a bad query. Which one? Under what circumstances? Is it supposed to be doing that much work because that's what the app needs, or is it an error that it's sucking down a gigabyte of data every few minutes?

How is the DBA to know what the usecases are?

The best tools that solve these runtime performance are modern APM tools like Azure App Insights, Open Telemetry, or the like.

Some of these products can be injected into precompiled apps using "codeless attach" methods, and this works... okay at best.

So SysOps takes your code, layers on an APM, sees a long list of potential issues... and the developers "don't care" because they think that this is a SysOps thing.

But if the developer takes an interest and is an involved party, then they can integrate the APM software development kit, "enrich" the logged data, log user names, internal business metadata, etc... They log on to the APM web portal and investigate how their app is running in production, with real-world users instead of synthetic tests, with real data, with "noisy neighbours", and all that.

Now if Bob's queries are slowing down the entire platform, it's a trivial matter to track this down and fix Bob's custom report SQL query that is sucking down SELECT * FROM "MassiveReportView" and killing the entire server.

Troubleshooting, performance, security, etc... are all end-to-end things. Nobody can work in isolation and expect a good end result.

DBAs don't necessarily need telemetry in an app to diagnose an issue with the app's behavior. They can run a trace and see some SELECT is running a thousand times a second and deduce that it's being called in a loop over the result set of an earlier query. And they'd be right to say hey, this is an app issue, open a ticket with the developer.

If you put that responsibility on the developer--meaning you expect the dev to diagnose an issue that they introduced in the first place--what kind of result do you think you're going to get?

Layering these demands takes away from the overall quality of the application in my experience. You want an app developer to learn all about Prometheus so the app can have an endpoint with all these custom metrics, okay, and you want structured logging and expect the dev to learn how to use Kibana effectively? All that's a huge cognitive burden that eats a slice of the same pie (their brains) as domain knowledge, language & runtime knowledge, etc.

Get maybe one app developer to specialize, get maybe one app developer to cross-train with ops or monitoring even. But leave most of us out of it.

When you flip that expectation of developer involvement in operations, it exposes how unreasonable that arrangement is. Hey, DBA, the app is sucking up resources. Why don't you crack open an IDE and write a patch for it? What do you mean you don't know Go, what do you mean you don't use Git? Every DBA should know how to attach a debugger to a remote process, shouldn't they?

It's just exploitative. Or at least that's been my experience, so there's my bias.