Hacker News new | ask | show | jobs
by lrem 1674 days ago
Being a Googler privy to the internal postmortem: there was no way to trigger this externally (the faulty server is in the control plane) AND triggering this by a Google engineer would require some determination and leaving a ton of audit trail.
1 comments

>This incident was caused by a bug in the configuration pipeline that propagates customer configuration rules to GCLB.

This line suggested it could be triggered from a customer. Is this inaccurate?

Hi. I helped write some of the internal postmortem and manage the data plane side of the team that responded to this.

Please allow me to reassure you: No. Absolutely not in this case. Not even slightly.

Any engineer can tell you customer configuration contents can cause bugs in configuration pipelines, but that's multiple layers away from this issue in our particular case.

Google runs microservices, so when the public postmortem mentions pipeline, it is a series of servers talking to each other. The problem happened towards the end of the pipeline, after multiple processing steps of the original user input. Furthermore, it was caused by a race condition, not mishandling invalid input.
Hard to know without access to the postmortem, but without it, I can think or two generalization possibilities to take advantage: 1) make config changes very quickly (very likely to have mitigations here), 2) make the configuration extremely large (what is valid but too large?), 3) both.

Inflict an off by one error? Joke.