| > Can't imagine what a service refactor is like at A. I bet it sucks It's not all that hard. AWS heavily focuses on Service Oriented Architecture approaches, with specific knowledge/responsibility domains for each. It's a proven scalable pattern. The APIs will often be fairly straight-forward behind the front end. With clearly lines of responsibility between components, you'll almost never have to worry about what other services are doing. Just fix what you've got right in front of you. This is an area where TLA+ can come in handy too. Build a formal proof and rebuild your service based on it. I joined Glacier 9 months after launch, and it was in the band-aid stage. In cloud services your first years will roughly look like: 1) Band-aids, and emergency refactoring. Customers never do what you expect or can predict them to do, no matter how you price your service to encourage particular behaviour. First year is very much keep the lights on focused. Fixing bugs and applying band-aids where needed. In AWS, it's likely they'll target a price decrease for re:invent instead of new features. 2) Scalability, first new feature work. Traffic will hopefully be picking up by now for your service, you'll start to see where things may need a bit of help to scale. You'll start working on the first bits of innovation for the platform. This is a key stage because it'll start to show you where you've potentially painted yourself in to a corner. (AWS will be looking for some bold feature to tout at Re:Invent) 3) Refactoring, feature work starts in earnest. You'll have learned where your issues are. Product managers, market research, leadership etc. will have ideas about what new features to be working on, and have much more of a roadmap for your service. New features will be tied in to the first refactoring efforts needed to scale as per customer workload, and save you from that corner you're painted in to. Year 3 is where some of the fun kicks in. The more senior engineers will be driving the refactoring work, they know what and why things were done how they were done, and can likely see how things need to be. A design proposal gets created and refined over a few weeks of presentations to the team and direct leadership. It's a broad spectrum review, based around constructive criticism. Engineers will challenge every assumption and design decision. There's no expectation of perfection. Good enough is good enough. You just need to be able to justify why you made decisions. New components will be built from the design, and plans for roll out will be worked on. In Glacier's case one mechanism we'd use was to signal to a service that $x % of requests should use a new code path. You'd test behind a whitelist, and then very very slowly ramp up public traffic in one smaller region towards the new code path while tracking metrics until you hit 100%, repeat the process on a large region slightly faster, before turning it on everywhere. For other bits we'd figure out ways to run things in shadow mode. Same requests hitting old and new code, with the new code neutered. Then compare the results. side note: One of the key values engineers get evaluated on before reaching "Principal Engineer" level is "respect what has gone before". No one sets out to build something crap. You likely weren't involved in the original decisions, you don't necessarily know what the thinking was behind various things. Respect that those before you built something as best as suited the known constraints at the time. The same applies forwards. Respect what is before you now, and be aware that in 3-5 years someone will be refactoring what you're about to create. The document you present to your team now will help the next engineers when they come to refactor later on down the line. Things like TLA+ models will be invaluable here too. |
The not-invented-here (or by-me) syndrome is probably also at play here.