|
|
|
|
|
by talonx
522 days ago
|
|
Rather than focusing on accountability as the starting point, I would suggest building tooling and visibility so that cloud costs are visible across all layers including application and infra. Once you have this, accountability becomes easier. Each application team should be able to view the total cost of running their service - and thus be held accountable to reduce costs when necessary. Without data you are running blind. Cost optimization cannot be solved by a standalone team - it has to be owned by everyone. Source: Personal experience reducing cloud costs in a slightly smaller team. |
|
The problem is when 30 odd microservices (each team owning between 5-10) talking to each other. In pre-production setup, changes in few of these services might not have noticeable impact on cost which will become quite apparent in production. When this happens, we definitely notice an increase the cost and the unit metric. But then we dont know where to start fixing this problem from. Right now, this becomes a war-room situation based on the severity but I dont think this is sustainable.
In comparison, if we take API latency as a metric, the accountability and ownership are clearly defined: If an API slows down, the team that owns it, fixes it. They can work with anyone they need to but its their job to fix it.
Did you face similar concerns/issues? Not sure if this is a problem other engineering teams are struggling with or even considering as a real problem to invest it.
I'm also not sure if theres a "standard" way of doing this which we should be thinking about. So, looking for ideas and thoughts here.