| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lrem 1674 days ago
	Being a Googler privy to the internal postmortem: there was no way to trigger this externally (the faulty server is in the control plane) AND triggering this by a Google engineer would require some determination and leaving a ton of audit trail.

1 comments

daenz 1674 days ago

>This incident was caused by a bug in the configuration pipeline that propagates customer configuration rules to GCLB.

This line suggested it could be triggered from a customer. Is this inaccurate?

link

KirinDave 1673 days ago

Hi. I helped write some of the internal postmortem and manage the data plane side of the team that responded to this.

Please allow me to reassure you: No. Absolutely not in this case. Not even slightly.

Any engineer can tell you customer configuration contents can cause bugs in configuration pipelines, but that's multiple layers away from this issue in our particular case.

link

lrem 1674 days ago

Google runs microservices, so when the public postmortem mentions pipeline, it is a series of servers talking to each other. The problem happened towards the end of the pipeline, after multiple processing steps of the original user input. Furthermore, it was caused by a race condition, not mishandling invalid input.

link

sroussey 1674 days ago

Hard to know without access to the postmortem, but without it, I can think or two generalization possibilities to take advantage: 1) make config changes very quickly (very likely to have mitigations here), 2) make the configuration extremely large (what is valid but too large?), 3) both.

Inflict an off by one error? Joke.

link