Good article, although I'm a bit surprised at the difficulty the author proposes for rolling back the log bug. Why not simply serve 500s to 95% of your clients? That way, you get logs sent to you gradually.
I think they expected it wouldn't have much effect, since the client was also updated to delete the old (corrupted) log. Because the logs were always deleted after a success, and updates started with the oldest, the corrupted log would necessarily become the oldest.
By the time the DDoS was in effect, the corrupted logs had been deleted by the client. They would now always succeed (even with the old server code, or old client code) until they got a new corrupted log.
Yes, I'm saying that they could have just served 500s to clients (even ones with regular log files), which would have backed off and retried later. Essentially what the "chillout" method does too, but it doesn't sound like the author had considered it.
I'm guessing the answer is that inserting a feature to serve 500 to a fraction of the incoming requests would have required a similar amount of effort as just enabling the chillout feature.
By the time the DDoS was in effect, the corrupted logs had been deleted by the client. They would now always succeed (even with the old server code, or old client code) until they got a new corrupted log.