| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jsnell 2536 days ago
	Yep, that's the exact bullet point I was writing a response on. Security and abuse are of course special little snowflakes, with configs that need to be pushed very fast, contrary to all best practices for safe deployments of globally distributed systems. An anti-abuse rule that takes three days to roll out might as well not exist. The only way this makes sense is if they mean that there'll be a staged rollout of some sort, but it won't be the same process as for the rest of their software. I.e. for this purpose you need much faster staging just due to the problem domain, but even a 10 minute canary should provide meaningful push safety against this kind of catastrophic meltdown. And the emergency process is something you'll use once every five years.

2 comments

blr246 2536 days ago

Your response highlights a good idea to mitigate the risk I was trying to highlight in mine.

They want to have a rapid response path (little to no delay using staging envs) to respond to emergencies. The old SOP allowed all releases to use the emergency path. By not using it in the SOP anymore, I'd be concerned that it would break silently from some other refactor or change.

Your notion is to maintain the emergency rollout as a relaxation of the new SOP such that the time in staging is reduced to almost nothing. That sounds like a good idea since it avoids maintaining two processes and having greater risk of breakage. So, same logic but using different thresholds versus two independent processes.

link

jsnell 2536 days ago

Right. The emergency path is either something you end up using always, or something you use so rarely that it gets eaten by bit-rot before it gets ever used[0]. So I think we're in full agreement on your original point. This was just an attempt to parse a working policy out of that bullet point.

[0] My favorite example of this had somebody accidentally trigger an ancient emergency config push procedure. It worked, made a (pre-canned) global configuration change that broke everything. Since the change was made via this non-standard and obsolete method, rolling it back took ages. Now, in theory it should have been trivial. But in practice, in the years since the functionality had been written (and never used), somehow all humans had lost the rights to override the emergency system.

link

jacques_chester 2535 days ago

My personal rule is that any code which doesn't get exercised at least weekly is untrustworthy. I once inherited a codebase with a heavy, custom blue-green deploy system (it made sense for the original authors). While we deployed about once a week, we set up CI to test the deployment every day.

Cold code is dead code.

link

solatic 2535 days ago

> Security and abuse are of course special little snowflakes, with configs that need to be pushed very fast, contrary to all best practices for safe deployments of globally distributed systems.

Once upon a time, I worked on a system where many values which would otherwise be statically defined in similar systems where instead put into a database table. This particular system didn't have a proper testing and deployment pipeline set up, so whereas a normal system would just change the static value at some hard-coded point in the code and quickly roll it out, this system needed to keep it in the database so that it would be changeable in between manual deployments (months or even years apart). The ability to change the value facing the user by changing the value in the database inflated the time it took to test a release, thus exacerbating the amount of time it took to release a new version, but well, it worked.

My point is that if security and abuse rules need to be rolled out quickly, then the system needs security and abuse systems where the entire range of security and abuse configurations (i.e. their types) are a testable part of the original pipeline. Then the configurations can safely be changed on the fly, so long as the changes type-check.

It's easy to understand why it's never been built though - you'd need both a security background and a Haskell-ish/type-theory kind of background. Best of luck finding people like that.

link