|
|
|
|
|
by FroshKiller
220 days ago
|
|
Here's the extent of my interest: I take my understanding of your use case and specifications, then I write source code that tries to generate as few instructions to suit your needs as possible while still being comprehensible to the next maintainer. The app should write records to a database? Fine. Here's where you configure the connection. The app in production is slow because the database server is weak? Not my problem, talk to your DBA. The app should expose an HTTP endpoint for liveness probes? Fine. It's served from the path you specified. Your reused it for an external outage check, and that's reporting the service is down because the route timed out due to your ops team screwing up the reverse proxy? Literally not my problem, I could not care less. |
|
Okay, so, what is the DBA to do? Double the server capacity to "see if that helps"?
It didn't, and now the opex of the single most expensive cloud server is 2x what it was and is starting to dwarf everything else... combined.
Maybe it's "just" a bad query. Which one? Under what circumstances? Is it supposed to be doing that much work because that's what the app needs, or is it an error that it's sucking down a gigabyte of data every few minutes?
How is the DBA to know what the usecases are?
The best tools that solve these runtime performance are modern APM tools like Azure App Insights, Open Telemetry, or the like.
Some of these products can be injected into precompiled apps using "codeless attach" methods, and this works... okay at best.
So SysOps takes your code, layers on an APM, sees a long list of potential issues... and the developers "don't care" because they think that this is a SysOps thing.
But if the developer takes an interest and is an involved party, then they can integrate the APM software development kit, "enrich" the logged data, log user names, internal business metadata, etc... They log on to the APM web portal and investigate how their app is running in production, with real-world users instead of synthetic tests, with real data, with "noisy neighbours", and all that.
Now if Bob's queries are slowing down the entire platform, it's a trivial matter to track this down and fix Bob's custom report SQL query that is sucking down SELECT * FROM "MassiveReportView" and killing the entire server.
Troubleshooting, performance, security, etc... are all end-to-end things. Nobody can work in isolation and expect a good end result.