Hacker News new | ask | show | jobs
by dnsmichi 1719 days ago
Great analysis, thanks!

Slack support says that users should tell their ISPs to invalidate the DNS cache for slack.com https://status.slack.com/2021-09/06c1e17de93e7dc2 (access with 8.8.8.8 as resolver - fallback https://slack-status.azureedge.net/)

Since the faulty DS record was in .com, everyone has a max wait-for-ttl-to-expire time of 24h.

Google/Cloudflare etc. seem to also invalidate .com caching very quickly, 8.8.8.8 quickly was the first workaround.

Meanwhile, 14 hours later, DTAG in Germany still does not resolve. The default resolvers have dnssec enabled.

dig slack.com +cd

tells the resolver to skip dnssec validation tests, and then it works again. Screenshots with the command output in https://twitter.com/dnsmichi/status/1443840645513293853?s=2

Very interested in the post-mortem analysis. I think there were similar mistakes as with nasa.gov incident and the comcast analysis in 2012: https://www.internetsociety.org/blog/2012/01/comcast-release...

Learnings for me:

- dnstracer (https://gitlab.com/dnsmichi/dotfiles/-/blob/main/Brewfile#L5...) helps with detecting missing glue records, but not dnssec

- dnstrace (https://github.com/rs/dnstrace) is a better alternative with dnssec