What this suggests is that Slack, for reasons passing understanding, enabled DNSSEC on their zones (with a DS record that essentially turns DNSSEC on, and the accompanying key records) --- then disabled DNSSEC by pulling all the records. But the DS records are in caches; validating resolvers go looking for the keys, which don't exist, and say "welp, I guess Slack.com doesn't exist".
I wonder if they are using tooling that doesn't properly retain DNSKEY records for DS that recently removed? This is one of the reasons we perform controlled automated key rotation and removal in DNSimple, so that we can ensure we retain the keys in the authoritative zone on each key rollover giving the DS records time to expire from caches.
We had a DNS related outage with route53. Some of our zones just lost some records and then they reappeared. Could that explain what happened to slack's DNSSEC related records?
Aren't you, in fact, the same Thomas Ptacek who has repeatedly claimed that DNSSEC is so irrelevant that events like this would go essentially unnoticed?
> DNSSEC is moribund and almost nobody uses it; in reality, the DNSSEC root private keys could land on Pastebin tomorrow and nothing would "break"
We have this whole thread here about a "service disruption" for Slack, and nobody leaked the "root private keys" just one person made a dumb error and it blew up their site.
No, I'm the Thomas Ptacek who has repeatedly claimed that the only impact DNSSEC is going to have on the Internet is causing outages like this. It's right there in the blog posts; in fact, it's even in the 2007 blog posts I wrote about this on the Matasano blog.
The USG DNSSEC requirements, which seem to be a part of what happened, are fragmented and incoherent. OMB withdrew DNSSEC requirements in 2018, and CLOUD.GOV doesn't support it. But some older requirements documents still have them, and need to be updated.
The important top-line thing to know here is that virtually all tech companies eschew DNSSEC (you can verify that for yourself with `host -t ds stripe.com`; substitute any other company for Stripe.
https://dnsviz.net/d/slack.com/YVXX_g/dnssec/ the dnsviz analysis showing the slack.com zone DNSKEY existing at 12:55, followed by the the .com zone DS record at 15:30. However, the next analysis at 17:24 shows both the .com zone DS and slack.com DNSKEY records have disappeared!
Given that the slack.com DNSKEY shows up with a 1h TTL and the .com zone DS has a 24h TTL, they are screwed in the presence of cached slack.com DS records from the .com zone. Do not throw away your DNSKEY until your delegation's TTL has absolutely positively surely expired from any resolver caches!
The slack.com domain is an AWS Route 53 zone, I'd be really interested to see a post-mortem explaining what happened here. Are they unable to recover the KSK/ZSK and restore the DNSKEY/etc records?
What this suggests is that Slack, for reasons passing understanding, enabled DNSSEC on their zones (with a DS record that essentially turns DNSSEC on, and the accompanying key records) --- then disabled DNSSEC by pulling all the records. But the DS records are in caches; validating resolvers go looking for the keys, which don't exist, and say "welp, I guess Slack.com doesn't exist".