Hacker News new | ask | show | jobs
by donavanm 521 days ago
Three problems with this approach. 1. The ongoing “ktlo” (keep the lights on) maintenance cost for a minimal public aws service is on the order of 6 SDEs. Theres a constant stream of security and integration improvements like new IAM policy features, KMS handling, billing & metering, console chrome, docs, etc. These are generally nonnegotiable to keep _any_ sort of consistency across AWS.

This is discounting most operations/operational support, as you cant practically scale on oncall rota that small and your example is probably referring to a small customer base and a very infrastructure light service. You _could_ move that to a cheaper dev center & more shared ops, but that in itself is a lot of work.

2. Theres a lot of overlap between these services. Its often not complete, and theyll have different approaches. But fundamentally Amazons distributed nature means multiple orgs will solve similar problems that are adjacent but outside their focus, and then realize that “product” is someones elses better owned feature. Culling needs tk happen at that user experience level sooner or later. The linked blog post is a good example.

3. “Pay the average cost” is going to be shocking to see how much some of these cost to run. This cost increase is going to feel like extortion for customers who _already use other service_ and probably pay a lot more. Earning ongoing negative feedback over a minor service is not worth it. Nearly all services are priced based on a multi year forward looking marginal cost model. I would very much expect the average cost to be 10x the customer price for a small service like app mesh that never “reached scale” or got their P&L together. Then add in those dev salaries I mentioned.

Most importantly those ktlo dev hours in point #1 are an opportunity cost for AWS. They should be spent on new revenue generating work, not cost reduction, and certajnly not a dead end. More than 10 years ago the rough rule of thumb was that a dev year of work should be targeting $12M/yr in revenue generation. I know we didnt consistently hit that, but thats the kind of benchmark that would be used to evaluate staffing. In AWS today if the overall product doesnt have a line of sight to $500M its not a viable innestment.

2 comments

explaining that KTLO requires 6 SDEs is something I wish every executive knew.
It definitely scales of complexity/surface area, and Id say that straight number is a lower bounds fixed cost that you might see from an unsuccessful “small” service. In larger orgs its better a percentage, because its relative to overall investment/growth. My personal goal would be 20% ktlo, usually much closer to 40% in practice. All based on my personal experience doing the cloud thing across 2 companies for 15 years or so.

And I will give credit to AMZN/AWS, of old at least, that the above would be common knowledge with an expectation of much more org specific details for anyone sr manager/PE through VP whod be involved in roadmaps, HC allocation, etc.

are we assuming only 9-5? 2.3 SDE could do that, but i don't get 6. a 24/7 operation needs 2FTE and 1 PTE per shift, and there are three shifts.

And i suppose "6 SDEs" for a service that has... what number of MAU or whatever?

me and a great friend of mine split our roles with our "projects", he deals with paid clients, and i deal with our required services (email, matrix, nextcloud, pastebins, misskey, PBXen, wireguard/VPN/Proxying...) and there's just two of us. I can't even think of the last time we had downtime, except at the tail-end of last year, one or the other of us caused a fault in proxmox, where commands were no longer issuing, and we had to do a reboot and fsck, the drill.

I understand at "AWS" scale obviously there's going to be more fires than 1 or two people can put out, but 6 SDEs is over 2 million in compensation all told for amazon, right?

So i guess what i am asking for is context. I've run stuff with hundreds of millions of monthly unique (for a paycheck). That company had over a billion uniques across all of their verticals 14 years ago. One network admin (like senior, but still). When i was there, there were 4 DBAs. there was 1 who could deal with developers, the oracle/postgresql guy who is one of my favorite humans ever - he could convert C-level grunts into sql queries or whatever it is they do. Then there was a part time mariadb; and then a full time Oracle DBA for just 1 site. I'm really straining to remember who else was basically infra/ops and it was less than 6[^1] of us for the whole company. We did have a NOC in another country but prior to my arrival all they were really trained to know how to do was page the correct person in the US. Before i left, 3 of the NOC people moved to the US and started working at the office i did.

If you know anyone else that's worked for this company, you'd know. If you haven't used one of their sites today, you probably will this week.

sibling and other comments note that amazon has "consistent look and feel and management and..." so maybe all that "busywork" takes 6 full times?

[^1] i don't count the DBA team as KTLO since i can only think of a single instance when i had to be on a conference and they also were. hurricane in Va!

> 2.3 SDE could do that, but i don't get 6. a 24/7 operation needs 2FTE and 1 PTE per shift, and there are three shifts.

having the team constantly on call is a recipe for having them quit. sure, you could run this with the bare minimum number of engineers, but your turnover would be so high and given how high hiring costs tend to be, this is a net negative

> so maybe all that "busywork" takes 6 full times?

ktlo in a constantly changing company is not easy. software / host patching to maintain compliance is necessary busywork that requires a bit of babysitting. not to mention keeping up with required migrations to new stuff due to internal deprecations.

You’re forgetting one important point. No one who has any interest in their career inside or outside of Amazon wants to be on a service team for a dead service. That would look horrible on both their promo doc internally and when trying to get through a behavioral interview externally.

Source: former AWS employee