Hacker News new | ask | show | jobs
Ask HN: Impressions of Google SRE?
28 points by im_not_the_one 1351 days ago
Hello HN. I'd like to pick your collective brain.

I have 18 years of experience as a software engineer, and I'm considering a role inside Google's SRE (Site Reliability Engineering) organization. I have very little operational experience; my career so far (and certainly in recent history) has been almost pure development.

I'm wondering if anybody could share their experiences from inside the Google SRE group.

1. For people who went from being a pure developer to a SRE, how hard was the adjustment?

2. How do you feel about being on-call?

3. How much will my lack of operational experience hurt me?

4. What is the balance between operational work and project work?

5. For people who decided to leave the SRE group, why did you decide to leave?

6. Any regrets?

I have another potential match with a non-SRE team, and I'm weighing both options.

Thank you in advance for any information or advice. It's a big decision.

9 comments

As always your team matters the most. I went into SRE to learn how to make gigantic and complicated systems reliable. I've learned a lot.

1. It was an interesting adjustment. The work is qualitatively different but it's still software engineering. Just at a higher level.

2. On-call is the best I've ever had at any company. It's 12h a day max and overtime is compensated. You won't be woken up in the middle of the night. There's different tiers as well. I'm tier 1 so that means a five minute response time. That's too much for some people and I don't blame them.

3. My team is ramping up people straight out of university. You'll be well trained and have half a year or longer depending on how critical the service is.

4. In my org it's something like 25% of your time on-call at most. Depending on the size of your team it's less.

5. Not applicable. My first SRE job.

6. None so far. Google is very nice to work for. Like all jobs it'll be stressful at times but it's better here than anywhere else I've worked.

What do you mean you won’t be woken up in the middle of the night? Someone is going to need to wake up in the middle of the night.
There’s another team in the opposite timezone that handles on-call, for the majority of Google SRE teams.

Additionally as noted elsewhere, on-call time is directly compensated.

With a 5 minute response time service: you are credited 2/3rds of the hours that you are on-call outside of local business hours. These can be used as additional paid time off, or paid out in cash.

With a 30 minute response time service the hours are credited as above, at 1/3rd time.

Oh that’s cool. I should consider that for my org.
One big benefit of directly compensated on-call time is it removes much of the difficulty of covering shifts.

Instead of everyone feeling like they have to keep track of who owes each other a shift covered, it’s just paid out. In vacation hours or cash. So if you pick it up, no stress, you’ve been paid for it.

1) I have more of a sysadmin background, so someone else will have to answer this. I will say that if you interviewed in the sre-swe track, transferring to a swe role is not difficult.

2) I have been on call for most of my career, but google is the only employer that has actually provided compensation. Depending on your lifestyle, the extra pto can be really nice (for example it’s easy for me to ski on less-crowded days), but as I’ve gotten older being on call has definitely gotten more painful (now that I have a serious partner, not being able to do things on the days she has off can be frustrating.) The actual difficulty/stress of the pages you get will be highly team-dependent, but SRE does a reasonable job of training and tracking pager load.

3) This is highly team-dependent, but there are many teams where systems skills are not a big deal. You need to be able to think in a reliability-focused way, though.

4) Some teams actually develop their own code, other teams rely on swe teams for most of the code. You will read a lot more code than you will write, and you will write less code than they say you will during the interview process. This is also true for SWEs - everyone spend more time big-company-ing than computer-ing.

5) I have not done this, but I have considered moving to a SWE role because there’s more open source opportunities on that side.

6) I’ve found google to be the best place to work of the mega corporations I’ve been, but honestly if the money was the same I’d prefer a smaller company. I’m better at tasks like “chase this bug though a bunch of layers until you find some weird kernel behavior” than “explain why your rollout plan is compliant with our reliability directives.”

I spent a few years as both an SRE and a SWE at Google. I joined Google in an SRE team - prior to that, I was a developer at another company who did have a lot of operational experience. My experience is a bit dated but I can try to answer your questions as best as I can.

In my case, the adjustment from SWE to SRE was not very difficult since I had done a lot of operational work before.

The oncall was not very intense even though my team supported a critical product. IMO, Google handles operations and SRE much better than most other companies I have worked at. My SRE team was split between two locations in different timezones so, I never had to be woken up at night. I was also very well compensated for time spent oncall (as bonuses) which I have not seen at other companies.

I don't think your lack of operational experience will hurt, if you have the learning mindset. My goal was to learn how Google operates services at scale and even with previous operational experience, I learned a lot. I had coworkers who came from a pure development background and thrived in the role.

The split between operational work and project work can depend upon the team. The goal is to have a healthy balance and some teams manage this better than others. However, project work may not always involve development and it is easy to feel distanced from the user. This was one of the reasons I switched out of SRE after a few years - I felt like I wasn't coding enough and building features for users.

I definitely have no regrets about being on an SRE team - I learned a lot and would consider going back to join one again.

edit: for better readability

Thanks for your thoughts.

My impression is that it's fairly easy to move around within Google. Was that your experience?

I realize that the only way I will learn if I like SRE is to actually do it. And it feels less risky (I'd probably be moving across the country) if I know that I won't be "stuck" in a role that I don't enjoy.

Yes, it was pretty easy to switch to a SWE role within Google. I just had to talk to a few managers to identify a mutual fit and that was pretty much it - there is no formal interview (which I've seen some other companies require to switch teams). And the fact that I had SRE experience was actually a positive when it came to switching.
I'm not exactly the right person to answer this since I haven't done SRE at Google, but I did turn down an offer to be a Google SRE, and then subsequently went to work in a heavily SRE oriented organization, and eventually landed as a SWE at Google.

Some thoughts:

- being on-call can be anything from soul-sucking to mildly annoying depending on what team you land on, but a lot of SWEs are also on-call, including at Google

- a lot of the entry SWE positions at G (any level) are things that they have trouble recruiting for internally, so unless you know exactly what team you're joining you may not find it as exciting as you would like

- Google's SREs are top notch and you can expect to develop some new skills and learn a lot

- SRE work on a good team is actually pretty fun if you like problem solving and can deal with mild stress. AFAICT most SREs are very big into no-blame culture, at least at good organizations

- retrospectively I probably would have enjoyed Goog SRE as much, maybe even more than I did being a SWE, which was: mostly

I think the SRE role in Google is the best implementation of the role. They have 2 tracks called SRE-SWE and SRE-Systems Engineer and actually both are expected to write software majority of their time, only difference from normal SWE is where you focus I guess.

However, your life as SRE after Google if you don't take an SWE role can be very different to what you experience at Google. The reality is in majority of the companies system administration teams labelled as SRE as hiring tactic and code base quality, the type of things you develop, amount of project vs toil varies massively towards negative.

My recruiter is quick to remind me that this is a SRE-SWE role.

Yes, my impression from the SRE book is that Google has put a lot of thought into the role. It's good to hear that reality lives up to the expectation.

Thanks for the reply.

There is very little headcount right now, team matches are taking forever. I'm not sure how far you are in the process, but if you want a new job SOON, this is probably a bad time to join. If you want to get in the pipeline now because you want to get a job in six months, this might be a good time.

Source: Did interviews in early, passed HC for senior SWE in SRE in July with a team match, and now i'm stuck in team match purgatory.

Why are you joining as a SRE instead of a SWE? Is it the role you want, or is it just your foot in the door?

When you were doing "pure development", were you involved in the operational side of things at all?

Personally, I am also what I would call a "pure developer", and I find SRE work really stressful and very different skillset from what I'm good at. I've often thought that calling ourselves "software engineers" is pretentious, but I would be okay with SREs calling themselves engineers because of the kind and style of work they do. I would say the things I do as a "pure developer" are more akin to creative expression rather than engineering though.

Why are you joining as a SRE instead of a SWE?

My reading-between-the-lines is that Google has more open SRE positions than non-SRE positions. SRE wasn't what I was necessarily pursuing. But no job is perfect and this seems like a good opportunity.

Primarily, the reason I'm still considering this one was the people. I spoke with my potential manager and her boss. I think I'm at a point in my career where a good management fit is just as important as a good work fit. When I asked about their management philosophy, they both gave great answers.

---

When you were doing "pure development", were you involved in the operational side of things at all?

My current project doesn't really have "operations" per se. It's a plugin to desktop software. So I do sometimes get sucked into customer issues when they get to be too tricky for support to handle, but that's pretty rare.

I do application-level administration of our source control server, and I was previously responsible (briefly) for our build infrastructure. But there is no "on call" for either.

---

I find SRE work really stressful and very different skillset from what I'm good at

To clarify, do you do SRE work at Google or at another company? My impression is that SRE has a different tone at Google than in most other places.

---

I would say the things I do as a "pure developer" are more akin to creative expression rather than engineering though.

I would say similar things. I think it's often a weird melange of math and art. We need to be much more precise than with other forms of creative expression, but creativity plays a huge role in it.

Still, I wonder if I could gain something by embracing the "engineering" mindset. It would be a stretch, sure, but I think it could be a good stretch for me. That's why I haven't dismissed SRE completely.

I'm a SWE @ google, but I work on an infrastructure team so it bumps up dangerously close with the SRE world. I go on-call once every 2 months or so, so I end up essentially being a SRE that week. The SREs that I've met at Google are world-class, so if you do think being exposed to that kind of rigor would benefit you, Google is definitely the place to earn your stripes.

Personally though, I have a hard time sleeping during that on-call week, even if I'm not being paged, just from the stress of possibly being paged at any time of night and need to go fix a production issue.

But, some people thrive with that, and there is a fairly broad range of work that falls under the SRE umbrella - you've got your fire-fighting SREs, your tool-building SREs, your project-consulting SREs, etc...you would probably be able to find a role that fits your talents/desires within the organization.

Like most Google jobs, but with extra pretentiousness.
Is on call optional at Google?
I don't know about other SRE groups, but from what they told me about this particular group, no.