Hacker News new | ask | show | jobs
by throwaway823882 1873 days ago
I agree with this post 200%.

> Irecently spent some time trying to write a set of general guidelines for what to monitor in a software system

Reframe as "Shit That Needs To Run To Make The Customer Happy" and you get closer to what you want. Which is to say, it's completely product-specific. A general list of technical things to monitor is about as useful as monitoring the cotton thread fiber integrity of a pair of shoelaces. Is it the cotton thread fiber integrity what you care about, or a general quality of the shoelaces? Are they shitty laces, or just decent, or great laces? Quantify that.

> Typically, the former kind of PRR will take a quarter or more, because invariably, new large services have a significant amount of tasks to do to get them production-ready. The SRE team onboarding the service will spend significant time finding gaps, understanding what happens when dependencies of the service fail, improving monitoring and runbooks and automation.

I deal with these a lot of the time, and I hate them because they are so stupid. We could make these reviews completely self-service and automated and they'd move a lot faster, and could even be on-going as the product is actually released to customers. But SRE and Architecture remain their own silos, and neither of them work closely enough with the product team or core engineering groups to find the streamlined, agile ways of doing these things. Basically, none of them grok the concept of finding quicker and better ways to get this shit done. Or they just don't care to.

> The second kind of PRR typically does not uncover much to be done, and devolves into a tick-box exercise where the developers attempt to demonstrate compliance with the organisation’s production standards.

Architecture and SRE don't explain to the product team WTF they are going on about, so of course they just tick boxes mindlessly. Nobody wants to stop and understand the whole picture, so you end up with empty formalism.

The way to "formalize" and "standardize" the operationalizing of a product is to make it clear what the fuck is going on at each stage of your product. Who the fuck are my customers? What the fuck are they doing with the product? How the fuck does the product work for them, and internally? What the fuck are the external dependencies and how do they work? You need simple, practical ways to express these things.

And you also need to train people as to why everyone needs to understand these things. Why you cannot just allow someone to sit in their little corner of a room and jerk off and collect a paycheck. I often hear it from developers ("I just want to write code") but literally everyone else in the organization does it too.