Hacker News new | ask | show | jobs
by lmm 1709 days ago
I don't find having high-level architects to be a good pattern. They can make mistakes like anyone else; indeed having people who are no longer day-to-day coding make decisions that they don't feel the effects of makes wrong decisions more likely.

SRE exists to support product functions and like everything else should be attached to and understood in terms of those functions. Yes, every product group probably should have their own SREs, so that product group can own its whole lifecycle. Yes, different groups will do their own things and there will be mismatches and duplicated effort. That's less bad than the alternative.

2 comments

I'm not saying that they should not be active developers, but people who can enforce change across the entire organization.

Previous Job (2500+ devs by the time I left) had an in-house RPc system that was being moved over to gRPC. That project was taking years because teams had no coordination on this process. The decision was made at some level and trickled out to everyone else. There was no single person or group who was in charge of:

- How services would be discovered - Implementation Patterns of how Services & Methods will be defined - Standardization of which libraries to use - Examples and Pre-build shared libraries that provide the stuff like tracing, monitoring, retries, etc... - Advocating for the changes

SRE seems to fall into the position of advocating business value for development practices that compete with business objectives that can provide value as well. At large organizations, if you don't have a central point that can set development objectives and be the one who teams can go to with "this pattern doesn't work for us, we can do this but we need buy in from other teams" issues and have directives handed down.

Unless you operate in an environment where the only cross-team communication is well versioned public APIs, then you will run into issues where you have to conflicting needs between teams and need someone to set a vision (this can be a group of people, rotating people, or a single person. how is not the issue)

The whole idea of enforcing technical mandates across the entire organisation is something I'm very sceptical about. No-one can hope to understand the constraints and requirements that 2500+ other devs are working under. Realistically the cross-team bandwidth is low, so if you don't have well versioned public APIs then you have barely understood interactions and no clear responsibility when they break.

There are probably some things do need to be standardised, but if there's a business need for standardisation then product teams should be able to understand and advocate for that (whether that means agreeing something with their directly adjacent product team, publishing something for clients to use, or something else). But in a lot of cases I think just accepting that different parts of the organization will work differently is the best way forward.

We recently decided as a company that the horizontal responsibilities structure doesn't work well at all, at least not at small scale. This was not in the software/infra teams but in our operations but I think there's some general truth here. The more vertically responsible your teams are, the better the final product is, and the more inefficiencies and impedance mismatches you can track down and fix.

For us it meant that the data processing teams have been made part of the drone operators team, so whenever we fly a mission a photography/3d rendering expert will also be part of the team that operates the drone. On paper it's more expensive to have office workers in the field, but in practice it leads to fewer reflights and happier and more productive employees.

I imagine that for the software departments, it could mean that every app development team has at least one member that has good operating system and network infrastructure knowledge, and/or maybe database expertise so that the team as a whole can largely operate a feature largely without having to depend on an outside SRE specialist.

And then the SRE's that you do have can focus on the site reliability, instead of having to constantly tell developers how the way they coded something is bad or whatever.