| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rio517 1541 days ago

I struggle with a lot of the arguments made here. I think one key thing is that staging can mean different things. In the authors case, they say "can’t merge your code because someone else is testing code on staging." It is important to differentiate between this type of staging for development testing development branches vs a staging where only what's already merged for for deployment is automatically deployed.

Many of the problems are organizational/infrastructure challenges, not inherent to staging environments/setups. Straightening out dev processes and investing in the infrastructure solves most of the challenges discussed.

Their points:

What's wrong with staging environments?

* "Pre-live environments are never at parity with production" - resolved with proper investment in infrastructure.

* "There’s always a queue [for staging]" - is staging the only place to test pre-production code? If you need a place to test code that isn't in master, consider investing in disposable staging environments or better infrastructure so your team has more confidence for what they merge.

* "Releases are too large" - reduced queues reduces deployment times. Manage releases so they're smaller.

* "Poor ownership of changes" Of course this happens with all that queued code. address earlier challenges and this will be massively mitigated. Once there, good mangers's job is to ensure this doesn't happen.

* "People mistakenly let process replace accountability" - this is a management problem.

Solving some of the above challenges with the right investments creates a virtuous cycle of improvements.

How we ship changes at Squeaky?

* "We only merge code that is ready to go live" - This is quite arbitrary. How do you define/ensure this?

* "We have a flat branching strategy" - Great. It then surprises me that they have so much queued code and such large releases. I find it surprising they say, "We always roll forward." I wonder how this impacts their recovery time.

* "High risk features are always feature flagged" - do low risk features never cause problems?

* "Hands-on deployments" - I'm not sure this is good practice. How much focus does it take away from your team? Would a hands-off deployment with high confidence pre-deploy, automated deployment, automated monitoring and alerting, while ensuring the team is available to respond and recover quickly?

* "Allows a subset of users to receive traffic from the new services while we validate" is fantastic. Surprised they don't break this into its own thing.