|
|
|
Ask HN: How do you organize a platform team?
|
|
6 points
by davideberdin
1380 days ago
|
|
I am a new manager of a large team (12 SREs) that are taking care of the Kubernetes platform in my company. This team is responsible for the provisioning pipelines (for both baremetal and AWS - no EKS is used), the Kubernetes controllers to integrate with other custom services, the observability stack, etc. The total fleet in use is around 6000 baremetal nodes and 1000 VMs in AWS spread over various DCs and regions. There are over 1500 developers actively using the Kubernetes clusters every day for a total of 2500 applications running in production. The team spends a lot of time in operations as well as solving compliancy issues, vulnerability patching and customer support. The struggle I'm having is "how to drive focus" and avoid to die of operations. The team is large making the Scrum process ineffective.
Every time I try to define teams and to split the people I realise that everything on the platform is so interconnected that the moment I would create 2 or 3 separate teams they would start being on top of each other. What would you recommend to do? |
|
The other thing is just to split the category of work into 3 things; p1 bug fixes / long term projects / support work. Each week just make a note of time allocation for each based on what's happening (sometimes a p1 fix can take up the whole week). Try to minimise the support burden by creating office hours and defining SLAs for the rest of the company.
Make sure your team is not getting buried in support work. What's going to help them is just being able to filter out what's an immediate priority versus pushing off to tomorrow or the day after. Don't let them get bogged down or pinged constantly. Try to make that request flow async.
And most importantly, give them the time to accomplish tasks they think are most important. They are deep in the trenches and know what's going to be p1 vs not. Trust in their ability to guide the outcomes.