| HN Mirror

Amazon API issues and the amount of time it takes to "discover" complex application stacks in production.

Concrete example: Our framework pulls in remote service dependencies via a link to an ELB in order to set remote HTTP endpoint URLs (yes, we know service discovery is a thing, but that's not where we were when we started). Some projects have 15+ dependencies, and it would take literally hours for it to walk the dependency tree. As a workaround, someone built the capability of passing in those dynamic URL endpoints and then the deployments were revised to build the remote URLs via string interpolation. Deployment time dropped to 10 minutes once we walked away from the concept of deploying stacks from the top down.

2nd concrete example: A developer used an incorrect argument during a deployment and deployed a second full stack of his application rather than replacing a single service. (I understand most other tools have diffs/change sets, but this particular developer isn't the sharpest knife in the drawer...) Rather than fix it immediately, he manually fiddled DNS entries and launch configs to create a mishmash stack. Naturally, he didn't tell anyone, so it took weeks (and lots of EC2 $$$) before we found and fixed it all.

I do see some value in an automated full stack deploy with all dependencies, but it should be the exception and not the rule.