|
|
|
|
|
by tptacek
209 days ago
|
|
The things you do to safeguard the rollout of a configuration file change are not the same as the things you do to reliably propagate changes that might happen many times per second. What's irritating to me are the claims that there's nothing distinguishing real time control plane state changes and config files. Most of us have an intuition for how they'd do a careful rollout of a config file change. That intuition doesn't hold for control plane state; it's like saying, for instance, that OSPF should have canaries and staged rollouts every time a link state changes. I'm not saying there aren't things you to do make real-time control plane state propagation safer, or that Cloudflare did all those things (I have no idea, I'm not familiar with their system at all, which is another thing irritating me about this thread --- the confident diagnostics and recommendations). I'm saying that people trying to do the "this is just like CrowdStrike" thing are telling on themselves. |
|
I took the "this sounds like Crowdstrike" tack for two reasons. The write-up characterized this update as an every five minutes process. The update, being a file of rules, felt analogous in format to the Crowdstrike signature database.
I appreciate the OSPF analogy. I recognize there are portions of these large systems that operate more like a routing protocol (with updates being unpredictable in velocity or time of occurrence). The write-up didn't make this seem like one of those. This seemed a lot more like a traditional daemon process receiving regular configuration updates and crashing on a bad configuration file.