Akamai Edge DNS was down

https://soundcloud.com/ryan-flowers-916961339/dns-to-the-tun...

People don't believe me when I say how much DNS matters. So I wrote a song about it.

dolni 1794 days ago

> People don't believe me when I say how much DNS matters.

That's weird to me. I have been working in sysadmin/DevOps for over a decade, but it did not take me very long to learn that DNS outages cause massive problems.

Right, but everybody has to learn that at some point. And I happen to be somebody who teaches such things. The importance of DNS is hard to overstate, but I go to great lengths to do exactly that, to make a point ;)

wpasc 1794 days ago

dns, DNS, dns, dns. The start of every process, dns.

Love this.

Thank you! I'm glad that landed where I wanted it to. It was a lot of fun to put together. I keep threatening to make a video. I need a collection of DNS memes so that I can just sideshow them.

https://www.youtube.com/watch?v=a3ww0gwEszo

wpasc 1794 days ago

Haha please do, a video would be great. Your song reminded me of the song: "Find the Longest Path" which you may get a kick out of:

southerntofu 1794 days ago

Sounds amazing! Do you maybe have a direct link? Soundcloud doesn't want us privacy-conscious users browsing their website :(

geocrasher 1793 days ago

As you wish :) https://meetryanflowers.com/wp-content/uploads/2021/07/DNS.m...

southerntofu 1793 days ago

I had a good laugh, thank you very much! :)

Should probably have a script play that on loudspeakers when monitoring detects problems /s

ricardo81 1794 days ago

Brilliant. NOERROR for this.

Frost1x 1794 days ago

This made my day, thanks!

And this, mine! Thanks!

mvanbaak 1794 days ago

Awesome! Thank you.

pololee 1794 days ago

Thank you!! lol

brianjking 1794 days ago

lol, thanks for the laugh.

patleeman 1794 days ago

I just teared up

LOL! great comment thank you!

kevando 1794 days ago

lol

zyberzero 1794 days ago

Thank you!

https://downdetector.com/archive

dbsmith83 1794 days ago

So many sites down... and unfortunately not one of them is Twitter

cpgeier 1794 days ago

Amazing that down detector manages to stay up during these kinds of outages. Noticed it has been a little slow but they really have done a good job keeping it up even though large portions of the internet is down right now.

mindcrime 1794 days ago

Who detects if Down Detector is down? Is there a isdowndetectordown.com site?

ksec 1794 days ago

I guess the mother of all Network Downtime checker is HN.

It's parked by GoDaddy, but unfortunately their website is fubar by this outage if you try to click through to see how much they want for it :)

SahAssar 1794 days ago

Sounds like when Fuckedcompany put itself on Fuckedcompany.

cube00 1794 days ago

"I dunno. Coast Guard?"

It's interesting that they report an AWS outage but there don't seem to be any issues there. Looks like their methodology is a bit too reliant on those speculative tweets from the first 5 minutes of all these sites going down. https://downdetector.com/status/aws-amazon-web-services/

> So many websites are down, are AWS servers down or something?

> Amazon web services is down which is affecting a lot of company web sites and services. Not sure what is going on.

> Miss us? @aldotcom and a whole bunch of other folks have been knocked off the internet by what appears to be an AWS attack/system failure. We'll be back. ?

mandelbrotwurst 1794 days ago

It’s just based on user reports, so this is people mischaracterizing it as an AWS outage.

Yep that's my point. I'm guessing that for a lot of sites they can verify if there's an outage pretty easily when they see a spike in reports, but for something like AWS unless they updated their status page (lol) or downdetector ran a bunch of stuff on there just to check with, I guess they don't have a good way to verify it.

mandelbrotwurst 1793 days ago

Gotcha, yeah I guess I always just considered that out of scope for their service and that it’s just a report aggregator but I suppose you would expect it to be at least a little bit clever based on the “detector” name

jacob019 1794 days ago

cloudfront was down too

grawprog 1794 days ago

You got your wish, looks like Twitter's on the list now too.

dheera 1794 days ago

Is there a way to tell your system to fall back to the last known IP address if DNS server isn't reachable?

Basically soft-invalidate your local DNS cache but it back from the cache graveyard if DNS is down.

elithrar 1794 days ago

You could run a local resolver like dnsmasq or Unbound that can “serve stale” on upstream failures, but that assumes the DNS failure is a client-facing resolver one.

From what I observed here, it was more internal DNS related: Newegg was serving an opaque “DNS failure” error page from Akamai’s front-end which is likely because their infra was failing to resolve names internally.

TimWolla 1794 days ago

Unbound has a 'serve-expired' option: https://nlnetlabs.nl/documentation/unbound/unbound.conf/#ser...

bombcar 1794 days ago

It should be possible to set your cache so it lives forever but still checks for a new IP at normal expiring time.

1f60c 1794 days ago

> Unfortunately not one of them is Twitter

Please keep comments like this off HN

tjpnz 1794 days ago

Just got booted out of Netflix on the PS4 because the console could no longer connect to Sony's license server. Netflix was working just fine by the way.

lxgr 1794 days ago

Was the app installed/running using a secondary PSN account by any chance? This shouldn't be happening on a primary account/console pair.

tjpnz 1794 days ago

It should be my primary although I've often seen it revert back after setting it. I did try setting it as my primary again but you know.

vmception 1794 days ago

Ah thats whats going on. Happened to me as well, I just assumed that Sony is neglecting PS4 performance with its new system, while bogging it down with bloatware.

hackerbrother 1794 days ago

Yup, I learned Hulu on Xbox One relies heavily on some Microsoft authentication during a recent Office 365 or Azure outage (not sure which).

tyingq 1794 days ago

You can see this on a lot of sites right now. You get the Akamai style error with something like:

  Reference: #11.453a2f17.1393u44848484.3aee33433

At the bottom of a very bland looking error page.

halfmatthalfcat 1794 days ago

You could argue Akamai is the blandest of the CDN bunch; their UIs are atrocious.

chrisweekly 1794 days ago

Their APIs are (or, were, last I suffered their use a few years ago) also terrible, eg blanket policy of refusing to cache any resource in the presence of "Vary" header, regardless of its value, and failure to honor standard HTTP headers... thankfully there are many other options for CDN, which are SO MUCH BETTER.

Akamai is their own worst enemy most of the time. Their prices are the highest, they trail on features, their documentation opaque, it takes an hour to propagate changes, etc. Only a few years ago you could only use SSL if you purchased their ridiculously expensive pci-dss plan - I thought they would defend that to their grave.

Better alternatives are Cloudflare, Fastly, AWS CloudFront.

Google Cloud CDN always seems to have very good latency but a very bare bones feature set and no edge compute I can identify. Support is always a huge red mark for Google anything.

youngtaff 1794 days ago

Surely it depends what you vary on?

Content-Encoding should be well supported, User-Agent less so and for very good reasons (there's too much variation in UA strings)

https://learn.akamai.com/en-us/webhelp/adaptive-media-delive...

judge2020 1794 days ago

> AMD automatically strips these headers out of requests to support caching for faster delivery.

> I need the Vary HTTP headers: AMD can cache the associated object if the Vary HTTP header contains only "Accept-Encoding" and "Gzip" is present in the Content-Encoding header

(AMD in this case standing for Akamai Media Delivery)

acdha 1794 days ago

It wasn't that simple — IIRC, for a while Vary meant “don't cache anything, ever, under any circumstances” unless you made some custom configuration changes. Over time they _added_ support for just “Vary: Accept-Encoding” (IIRC less than a decade ago) and that was fragile. They improved that over time but it was painful for a number of years because there were various failure modes which meant things wouldn't be cached, or (IIRC) compression would be disabled for certain URLs sporadically if the first request for the option did not request transfer compression.

dylan604 1794 days ago

yeah, but only tech nerds see it, so it's okay. maybe it's a ploy to get the users to go to the real command set via CLI. make it so shitty nobody wants the UI, and goes back to the terminal. "if you're not a CLI ninja, then you shouldn't be using our product anyways!"

lowbloodsugar 1794 days ago

What's frustrating is that DNS is returning an address, instead of just failing, and so macos is caching that value (though it might be cloudflare doing that).

LeoPanthera 1794 days ago

To empty the macOS DNS cache:

sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder

space_ghost 1794 days ago

Wildcard DNS should be a prosecutable crime, punishable by no less than 20 years of hard labor. (Edit: Probably should have made it clear that this was a joke)

gokhan 1794 days ago

Wildcard DNS helps me to handle multitenancy easily. What's wrong with it?

breakingcups 1794 days ago

I don't see how wildcard DNS is related to this? Nor how it's bad?

adamdoran 1794 days ago

Presumably you're referring to the practice of answering queries for nonexistent records with an A record belonging to an advertisement page? (instead of doing the right thing answering NXDOMAIN, presuming no records of another type also exist for the queried name.)

dnsmasq has a really useful feature for dealing with this: --bogus-nxdomain

dylan604 1794 days ago

When did congress members start posting to HN?

mvelie 1794 days ago

Akamai believes they have it fixed. We've seen our traffic return to normal. https://twitter.com/Akamai/status/1418251400660889603

roody15 1794 days ago

hmmm does not appear fixed here in the Midwest

cbeley 1794 days ago

I wonder if this is why LastPass is down. It has completely locked me out of my vault. You'd think it'd continue to work offline in a case like this. :/

eunai 1794 days ago

I switched to BitWarden and haven't looked back. You can use it on the phone and pc (browser). As well as a desktop client.

fredski42 1794 days ago

And with vaultwarden you can go self hosted with a very lightweight server written in rust.

AnIdiotOnTheNet 1794 days ago

Switched to vaultwarden at work for password management, only have minor gripes so can recommend.

benburleson 1794 days ago

Yeah, my path was LastPass -> Bitwarden -> 1Password.

Both Bitwarden and 1Password are great.

JonathanMerklin 1794 days ago

Then what was the impetus to switch off of Bitwarden?

decrypt 1794 days ago

Same path. It'll be very hard to move away from 1Password. App experience, sync, security features like key in addition to master password, family organizer-based recovery of an account, these are a few things that stand out.

macintux 1794 days ago

Yeah, I use 1Password for every critical bit of information (SSN numbers, physical access codes) and a whole lot of less-critical stuff. I expect to be a customer for life.

revscat 1794 days ago

Can you explain what family organizer-based recovery means? It sounds like dad or mom could recover a kids password?

eddieroger 1794 days ago

That's about right for what it is, or at least how I think about it. There's no magic "unlock vault" button (by design), but an Organizer can kick off a workflow to reset a vault if need be. I have a few of the more tech-savvy family members set as organizers in my family in case something ever happens to me.

https://support.1password.com/recovery/

chewmieser 1794 days ago

chewmieser 1794 days ago

My favorite feature personally is the built-in 2FA support. Click and it logs into your account and copies the 2fa code to clipboard so just paste on next screen.

Multiple vaults too is nice but I know others have ways to limit exposure of passwords in similar manners.

arnado 1794 days ago

Bitwarden offers this as well, but I don't really understand why you would want it. If someone compromises your password manager, 2FA is now worthless. Or am I misunderstanding how it works?

raffraffraff 1794 days ago

I prefer the browser addon for bitwarden over 1Password. Try editing a site in 1Password. It forces you to log into the full sir, whereas bitwarden can do almost everything right there in the addon.

judge2020 1794 days ago

This is also possible with the 1Password X extension, however there's a lot of feature segmentation and unclear messaging between the Desktop app-based version and 1Password X so I don't blame you for using the old one.

When it comes to password managers, 1password is the one to beat. Much better experience in every regard.

davidjgraph 1794 days ago

Serious question, has anyone properly solved the issue of DNS as a single point of failure?

sakisv 1794 days ago

Depending on what point you draw the line of "single point of failure" you could use multiple providers for your dns.

GOV.UK for example uses both aws and gcp for DNS

davidjgraph 1794 days ago

So, NS entries pointing to both? But then take the example your domain was in Route53 and AWS goes down. You can't configure the NS entries to avoid AWS DNS servers. Is the idea that child DNS servers detect the outage and cache the values in the name server(s) that remain up?

But then, the cached values from AWS take a while to clear, TTL never seems to be applied properly. It always feels like the worst case in such a scenario is you can point everyone at the right thing within 24 hours.

wongarsu 1794 days ago

Configuring two NS entries is pretty standard, so surely most resolvers try one of the two, and if it's down try the other one? What else would be the point of having multiple nameservers? Then you just have to get two nameserver providers and make sure their settings stay synced, and point your domain to one nameserver from each.

Of course that requires the server to properly fail, i.e. stop responding to requests. That doesn't seem to be the case here

tpetry 1794 days ago

You set both services in your ns records. So every day they share the load for dns resolution. If one day one of them is down the client can/will use a different nameserver from your configuration.

https://github.com/octodns/octodns

corobo 1794 days ago

Have them all hot and live rather than any sort of failover system. Keep everything in sync with OctoDNS or similar

DNS is fastest first* rather than main/failover. If AWS DNS was down your GCP DNS would have replied (if all is well) sooner than {timeout} so your visitor would still have a response

* Sort of. I think if the client doesn't get a reply from the server it picked randomly in 1s they move on to the next server, repeat until all fail

NotEvil 1794 days ago

Ibthink if route53 was down. Your dns provider whouldn't able to go there. So it will go to the root who will give gcp one too. So your dns provider might try that.

(I don't know if this is how it works, but I thibk that's how it supposed to work)

You typically have four name servers for a domain, but they don’t all have to be hosted with the same company. Very handy when your DNS provider decides to brag they are unhackable and the hackers reply by immediately hacking them followed by DDoSing them to death.

gregsadetsky 1794 days ago

gov.uk's traffic seems to be handled by Fastly, a well known CDN.

What I'm a bit surprised / unsure of is what happens when I run "dig ns gov.uk". The results are:

  gov.uk.     21559 IN  NS  ns1.surfnet.nl.
  gov.uk.     21559 IN  NS  auth50.ns.de.uu.net.
  gov.uk.     21559 IN  NS  ns3.ja.net.
  gov.uk.     21559 IN  NS  ns2.ja.net.
  gov.uk.     21559 IN  NS  ns0.ja.net.
  gov.uk.     21559 IN  NS  auth00.ns.de.uu.net.
  gov.uk.     21559 IN  NS  ns4.ja.net.

Who is ja.net , uu.net and surfnet.nl ..?

EDIT: I see that ja.net i.e. jisc.ac.uk "manages the second level domain .gov.uk" -- https://www.jisc.ac.uk/domain-registry . I imagine that uu.net and surfnet.nl are there for redundancy

sakisv 1789 days ago

Ah sorry, you're indeed right. Turns out it was just the .service.gov.uk domain that uses GCP and AWS - I just thought that applied to the parent domain too.

  $ dig NS service.gov.uk +short

  ns-cloud-e4.googledomains.com.
  ns-cloud-e3.googledomains.com.
  ns-cloud-e2.googledomains.com.
  ns-cloud-e1.googledomains.com.
  ns-831.awsdns-39.net.
  ns-1983.awsdns-55.co.uk.
  ns-117.awsdns-14.com.
  ns-1080.awsdns-07.org.

PaywallBuster 1794 days ago

  whois ja.net
    Domain Name: JA.NET
    Registry Domain ID: 499794_DOMAIN_NET-VRSN
    Registrar WHOIS Server: whois.demys.com
    Registrar URL: http://www.demys.com

"Demys is a leading provider of corporate domain name management and an ICANN accredited registrar"

  whois uu.net
    Domain Name: UU.NET
    Registry Domain ID: 5486163_DOMAIN_NET-VRSN
    Registrar WHOIS Server: whois.markmonitor.com

surfnet is just an ISP in Netherlands

https://www.surf.nl/

gregsadetsky 1794 days ago

Thanks

Is it possible to see if/where is gov.uk using GCP or AWS for its domain zones? From what I can see -- that's not the case? Or am I looking in the wrong place?

PaywallBuster 1794 days ago

I think you did the right query, maybe they're using it for different domain names?

paradite 1794 days ago

Last time I tried setting NS to both cloudflare and digital ocean in my domain registry, cloudflare sent me an email saying the configuration is invalid and asked me to revert. Am I doing something wrong?

mritzmann 1793 days ago

No, you have done everything right. At least from the point of view of DNS. That you can not use multiple nameservers is a limitation of Cloudflare (limit in the sense of: Cloudflare can only offer their services in the Free and Pro plan if they have full control over all nameservers).

paradite 1793 days ago

Thank you. I will look into alternative services on the thread then.

grishka 1794 days ago

And then there are Cloudflare and other Centralized Downtime Networks as another point of failure.

andoma 1794 days ago

Loled at this.

citrin_ru 1794 days ago

It is relatively easy to make DNS highly redundant: just put multiple DNS server in data-centers which are as independent as possible (different geo locations, different ISPs). You can also use different DNS software and different OS (say BSD+Linix) to exclude correlated bugs. Root DNS server AFAIK use different software for this reason.

Problems starts when you want to easy make frequent changes and introduce complex software to manage DNS zones (and complexity usually comes with bugs).

hk1337 1794 days ago

The problem isn't DNS though, is it? The problem is that people don't necessarily use the redundancies on DNS?

The whole reason it takes a domain 24h to fully work with DNS is because it propagates the information other DNS servers, thus making not be a centralized service.

jameshart 1794 days ago

DNS doesn't 'propagate' except in the very limited case of zone-transfer publication, which... nobody really relies on these days. Registrars tell you it takes 24 hours to propagate to stop you from complaining to them about your ISP's DNS caching policy. The reality is: recursing DNS servers have caches, they respect TTLs, and for the most part that means that DNS changes should fully wash through within an hour for most changes (less if you keep your TTLs shorter).

unilynx 1794 days ago

That differs per TLD though. In .nl updates are usually fully processed within the hour (they update the zone file twice per hour)

More accurately there are distributed caches, which expire on a simple timer basis, as opposed to updates being pushed immediately.

Relatively short TTLs are ubiquitous these days though.

tyingq 1794 days ago

It's an interesting question, as it's always been solved on the server side. All of the current problem is client side. That is, client resolvers that aren't using diverse providers, and only do things like round-robin with long timeouts.

kokey 1794 days ago

Anycast for the DNS IPs deals with most of the problems of clients not failing over elegantly when their primary DNS server is broken.

citrin_ru 1794 days ago

From a client (DNS recursor) point of view there is no primary server. There is just multiple NS records which are equal. If one of them is down it can introduce resolving delays, but they are usually small. At least if something like Unbound or Bind is used. Unbound e. g. maintains infra-cache where it tracks RTT and errors for each server and avoid servers which are down.

arberx 1794 days ago

Yes: https://ens.domains/

jakeschaeffer 1794 days ago

https://handshake.org is the only project I've seen that actually solves the issue with a decentralized root zone file.

https://namebase.io is a "registrar" for it.

https://learn.namebase.io/starting-from-zero/how-to-get-a-na...

airstrike 1794 days ago

Why does this need to have the whole NFT / crypto / auction angle?

This is so convoluted it actually makes the whole thing a non-starter

fwip 1794 days ago

Decentralized control of a centralized finite resource (domain names) requires consensus. For example, Joe Smith and Joe Blow both want joe.com.

You want a protocol that gives consistent "global" state without any centralized / trusted users - blockchain/bitcoin is one of the only technical solutions to provide that.

I agree that it's a garbage solution in practice, but that's why it's got cryptoshit bundled in.

A potential different solution to DNS monopoly, if that is a problem that needs solving, is multiple name-resolution providers that have differing records on what name points where. (The tradeoff is that an owner may need to register their name with multiple different providers).

Agreed. Blockchain is a convoluted solution, but it’s a solution for distributed consensus, if one feels that’s required. But in general I would argue the current root system has served us well and is open and free.

The world you describe, effectively with multiple roots, is coming. Russia have a switch (they’ve even tested it), to anycast out the root DNS IPs within the country, and block them externally. In theory this doesn’t make another “internet” (if IP space is still globally routable,) but in practice it does. Don’t be surprised if other countries follow suit (should they fail to leverage control of current infra via ITU or something.)

toddh 1794 days ago

You can still hardcode IP addresses. Not sure most people realize DNS isn't actually needed, you know, except for convenience and all that.

tyingq 1794 days ago

The "Host:" header in http[s] pretty much killed that. Half the internet would be a Cloudflare error page if we moved back to ip addresses :)

BenjiWiebe 1793 days ago

Add the name/IP to your local hosts file. It all works great then. Until the server changes IPs, anyways.

I did this with a website I liked which had let the domain expire. It worked for quite some time, until the VPS/whatever expired too. Good thing the Internet Archive is a thing.

Meh. Without DNS, or something similar, there really is no internet.

Obviously you are technically correct.

toddh 1793 days ago

The internet gets along quite fine without DNS. Packets route from network to network. DNS is an application-layer protocol. People often confuse the web with the internet. We use phone numbers for phone calls. It's conceivable with IPv6 you could nail up your IP address and use a QR code to make the addresses accessible. In a hundred years will DNS still be necessary? I don't think so.

It’s one of the most successful, global, distributed databases of all time.

What’s the single point of failure?

foobarbazetc 1794 days ago

Absolutely amazing how many billion $+ companies are single homed for DNS.

I wonder how much they spend on multi-AZ redundant architectures...

orblivion 1794 days ago

So here's a weird question: Supposing companies multi-home for DNS, or whatever other essential service, via multiple service providers.

Whatever multi-home means, why can't there just be one service provider that does that? And are we sure that these service providers aren't already doing that as best we might hope for? (For instance, Amazon already has multiple zones, etc.)

I suppose the one thing this can't protect against is some sort of political (broadly defined) threat related to the company itself.

lxgr 1794 days ago

> Whatever multi-home means, why can't there just be one service provider that does that?

Many of these outages are due to pushing broken artifacts or configuration to production.

A single provider can pretty easily offer geographic or network topological redundancy, but administrative and/or technological independence is pretty hard to achieve in a single company.

orblivion 1794 days ago

I mean, I guess what I'm saying is that in theory a single provider could purposely keep two different departments that manage their own artifacts independently.

Records have to be kept in sync.

If one dept deletes a record and the other doesn’t, how do you decide who’s right?

You could add a third dept that gives them both orders, but now that third dept is a single point of failure.

orblivion 1793 days ago

If I were a customer of two different companies for the sake of redundancy, wouldn't I have that same challenge? I could be my own point of failure.

Though, I suppose if I'm responsible for it, I fix it faster for myself.

https://kb.easydns.com/knowledge/easyroute53/

knute 1794 days ago

I believe EasyDNS can automatically push DNS settings to Route53 to host DNS in AWS. Doesn't protect you from fat-fingering a change, but you should be resilient to either EasyDNS or Route53 going down.

toast0 1794 days ago

Using multiple providers for mostly static DNS is easy, pick one as primary and AXFR to the other and notifications and whatever. Or it's not too hard to keep a zone file in source control and sync it to the providers.

Using multiple providers for fancy DNS, like only providing IPs that pass healthchecks or geotargetting users to datacenters gets pretty hard, because the different providers have similar capabilities, but no uniform interface, so you've either got to do it manually, or you have to build out your own abstraction that is probably limiting.

If possible, insourcing DNS makes the most sense to me, because if you can't keep your service online, it's not the worst if your DNS is offline; and if you can keep your service online, you probably won't mess up your DNS too badly.

jfoutz 1794 days ago

So much this. Keeping feature by feature parity is the tricky part.

nexuist 1794 days ago

Might be survivorship bias. Multi-AZ arch protects against all other failures, so the only one that remains visible to the outside world is DNS.

Most CDNs offer huge incentives for sending them more traffic, a lot of time you end up in a contract obligated to handle X requests and Y gigabytes of traffic per month. But personally I believe you should never have a single provider for anything - particularly when it’s acceptable for a company to cut you off with no warning or recourse.

Problem is, if your on Akamai’s CDN, only Akamai know where the local caches are. You need to be on their DNS only.

delgaudm 1794 days ago

Lastpass is down, so if you use lastpass the effect is significantly compounded.

Do they not cache everything locally? I'd have thought a password manager/secure data store would work offline.

stusmall 1794 days ago

They do.

nonfamous 1794 days ago

It still works in offline mode. You can’t update passwords, but you can retrieve them.

compscistd 1794 days ago

To enable offline mode, I had to turn on airplane mode on my phone before logging in.

lowbloodsugar 1794 days ago

So many sites being reported as down, but change your DNS to something else (e.g. Google 8.8.8.8 and 8.8.4.4) and, after flushing your DNS cache, the sites are available. I was unable to get to ups.com or newegg.com (why yes, I am expecting a new toy), but after switching DNS and flushing DNS cache, I was able to get to both.

Specifically, 1.1.1.1 provided bad addresses (as opposed to no addresses), and removing 1.1.1.1 fixed my problem. By then it had returned a bunch of bad addresses and I had to flush my DNS cache.

aix1 1793 days ago

Could you give an example of what you mean by a "bad address" in this context?

lowbloodsugar 1793 days ago

This is from the time of incident:

Server: 1.1.1.1 Address: 1.1.1.1#53

Non-authoritative answer: Name: newegg.com Address: 23.35.185.6

Server: 8.8.8.8 Address: 8.8.8.8#53

Non-authoritative answer: Name: newegg.com Address: 104.80.92.252

104.80.92.252 is newegg.com

23.35.185.6 is a server that provides an error message.

So 1.1.1.1 lied. The proper response would be to reply "I don't recognize that domain". Instead it said, "yeah, I know that, its here..."

Newegg was not down, and when I got macos to forget what it had cached from 1.1.1.1 I was able to use newegg.com fine.

thunfisch 1794 days ago

Yep, all our EdgeDNS zones as well as DSD edgekeys are just returning SERVFAILS. Many big german websites are down right now.

zhdc1 1794 days ago

Several unrelated websites I was trying to visit are down. I figured I would find the answer on HN : )

mariusseufzer 1794 days ago

Same haha

knaik94 1794 days ago

I am surprised financial institutions don't have any regulation for redundancy. The one that stuck out to me is the Navy Federal Credit Union website being down. I have not had any issues logging into mobile though for some of the reported sites.

deckard1 1794 days ago

this is prime shit Hacker News says right here. Wait until you learn banks close on Sunday. Or have maintenance windows for their website, ATM, etc.

toomuchtodo 1794 days ago

Commercial banks are held to a different operational resiliency standard than financial infrastructure.

(a component of my consulting work is reporting to financial regulators for institutions)

Terretta 1794 days ago

> financial institutions don't have any regulation for redundancy

As CTO of a bank, I wasn’t aware of this. So either we wasted a ton of money and time constantly upgrading redundancy and business continuity technologies to satisfy our regulators… or this statement could be mistaken.

christophilus 1794 days ago

I'm not sure how easy it would be to regulate. But yeah. I've got a few short term trades in my brokerage account, and outages really throw a wrench into those.

xyzzy21 1794 days ago

The way regulate is like anything else: if they fail to meet QoS uptimes, they get fined in 6-8 figures for every minute of loss.

brentm 1794 days ago

CapitalOne has a broken login which is pretty surprising to me.

cryptoz 1794 days ago

All major Canadian banks were down.

cbono1 1794 days ago

Why would Google and Amazon be on the downdetector list or experiencing issues? Don't they have their own DNS / nameservers separate from Akamai?

sathackr 1794 days ago

because the way downdetector works is it just basically counts how many people are searching/visiting for <site> down and if it's much higher than typical it flags the site as down.

So if everyone searched "is google down" and visited the link on downdetector that was returned in the search, that would add to the downdetector count for that site.

Downdetector doesn't actually know if the site is up or down.

https://www.speedtest.net/insights/blog/how-downdetector-wor...

k1t 1794 days ago

I found this hard to believe, but it's correct.

Downdetector only reports an issue if a significant number of users are impacted. To that end, Downdetector calculates a baseline volume of typical problem reports for each service monitored, based on the average number of reports for that given time of day over the last year. Downdetector’s incident detection system compares the current number of problem reports to this baseline and only reports an issue if the current volume significantly exceeds the typical volume of reports.

What’s hard to believe? Downdetectors well known for being almost, but not quite, useless.

Probably reported Google as “down” because a whole bunch of people use the word “Google” when they mean “internet”.

brentm 1794 days ago

A more proper name might be PeopleThinkItsDownDetector.com

cbono1 1794 days ago

Not nearly as SEO friendly

mc32 1794 days ago

So how do they reset status? The number of queries going down signifies return to normal status?

dylan604 1794 days ago

Some CEO calls another CEO and makes a deal?

Yep

memco 1794 days ago

Was just browsing a website where the first page of a query worked, but visiting page 2 of the results was returning a DNS error. Was curious how and why only part of the site was down, but it looks like this was the problem as now the whole site is down.

katbyte 1794 days ago

aren't short DNS TTLs great?

sebmellen 1794 days ago

Is this a serious argument for long TTLs? Always wondered why they exist… How interesting.

slim 1794 days ago

Yes it is. The longer the TTL the longer you stay independent from third parties. It's what makes the internet stable.

remram 1794 days ago

Long TTL makes you independent from DNS third parties, in that your name is still know by clients if DNS is down.

Short TTL makes you independent from hosting third parties, in that you can quickly change which hosting provider your domain name points to.

You can't win this one by only changing your TTL. The best solution is to use short TTLs and multiple nameservers on different providers.

sebyx07 1794 days ago

The good parts of centralisation

schemathings 1794 days ago

Possibly related .. Verizon peering issues / ASN701 at Equinix NY2 in Secaucus NJ

mvanaltvorst 1794 days ago

What role does Akamai Edge DNS play in normal internet traffic? DNS responses usually get cached, as far as I understand correctly. And it is usually possible to change your DNS server to e.g. Google's and circumvent the outage. Does Akamai Edge DNS play a role on the server side?

uncertainrhymes 1794 days ago

If you use a CDN to front your traffic, you need the CNAME for www (or whatever) to be pointing at their DNS infrastructure, so they can return whichever closest POP is going to serve your traffic.

e.g. dig @1.1.1.1 www.nvidia.com +trace

... various things from the root ...

www.nvidia.com. 7200 IN CNAME www.nvidia.com.edgekey.net. ;; Received 83 bytes from 208.94.148.13#53(ns5.dnsmadeeasy.com) in 35 ms

So the main DNS is fine, but it'll never get an A record because the last link in the chain is toast -- edgekey being Akamai in this case, but all CDNs do this so they can route traffic. Normally, this is a good thing so they can shift traffic within 30 seconds on their side. Unfortunately, it also means it would take nvidia an two hours to point away from Akamai.

carlsborg 1794 days ago

Looks like this: the affected subdomains are CNAMEd to the akamai CDN, and the Nameserver for those are/were down.

So for example:

Top level domain for nvidia resolved fine..

dig @1.1.1.1 nvidia.com => status: NOERROR, Nameservers are ns6.dnsmadeeasy.com

But the website didnt. dig @1.1.1.1 www.nvidia.com => status: SERVFAIL,

The Nameserver for the this www.nvidia resolved to the akamai nameserver which had a problem..

dig @1.1.1.1 www.nvidia.com NS => CNAME e33907.a.akamaiedge.net.

r1ch 1794 days ago

The trend these days are DNS TTLs of 60 - 300 seconds, to allow "Cloud agility" or something, so sites are exposed to a much larger risk of authoritative nameservers going down.

jameshart 1794 days ago

You say that like it's a bad idea.

Services like Akamai use short TTLs for their edge services for a variety of reasons, not least because if one of their edge servers goes offline (for planned or unplanned reasons) it lets them sub in a new one and have it receive traffic immediately, rather than have a bunch of clients continue trying to talk to a dead node. So sure, you can increase those TTLs to trade 'what if the DNS server goes down?' risk with 'what if the edge server goes down?' risk...

But keeping the edge servers up and running is probably a lot harder - they need to scale more to handle traffic load, they have to actually handle client data, TLS termination, much more complex configuration.... so if I'm placing bets on which of those things is more likely to die on me, it's the edge node, not the DNS server.

NeckBeardPrince 1794 days ago

> What role does Akamai Edge DNS play in normal internet traffic?

Clearly a big one.

twalichiewicz 1794 days ago

Posted this is the thread about the travel websites being down, but seems Fidelity is entirely impossible to sign in to / trade right now.

https://www.bloomberg.com/news/articles/2021-07-22/multiple-...

SandroG 1794 days ago

Is this related to:

Multiple websites including DraftKings, Airbnb, FedEx, Delta and others appear to be experiencing issues.

00deadbeef 1794 days ago

Figured this out almost 30 minutes before they bothered to update their status page.

realSaddy 1794 days ago

This is affecting Steam as well

ssully 1794 days ago

It is impacting a lot of things: https://downdetector.com/

00deadbeef 1794 days ago

Well it's been an hour now since I first noticed the effects and their service status still has no useful information or ETA for a fix. It's just an "emerging issue".

jonnyone 1794 days ago

The affected sites that I use are now working. Check again.

testplzignore 1794 days ago

Strange thing about the duration of this outage... From logs I have, it seems to have lasted exactly one hour, from 15:38 to 16:38. Their Twitter account also said "disruption lasted up to an hour", though they incorrectly said it started at 15:46 (did it take 8 minutes for their monitoring to notice?).

That makes me think that whatever the fix was, it had to wait for some one-hour cache to expire before it took effect. I'm very interested to find out what the cache issue was, more so than what the original bug was.

swarnie_ 1794 days ago

I love seeing these issues reverberate around the internet.

This time i think /r/sysadmin pegged the issue first, great sub.

nowahe 1794 days ago

I'm in the middle of a migration from Akamai to Cloudfront, time to take a break I guess

Scoundreller 1794 days ago

All yuor data are belong to us

soheil 1794 days ago

App Store on MacOS is down!

aliswe 1794 days ago

Not only that their support telephone line (in sweden) was down as well

xyzzy21 1794 days ago

And people wonder why I try to avoid depending on online anything...

didjathinkmess 1794 days ago

Cyberpolygon already? Thought we had at least a month or two

penultimatebro 1794 days ago

Shh, normies are not ready for that.

It’s just a completely random DNS outage, nothing more.

SjorsVG 1794 days ago

Many bank systems are disrupted by this in the Netherlands

ricardo81 1794 days ago

My UK bank (HBOS) seemed to have 'online banking unavailable' though their site was up. No doubt related.

SjorsVG 1794 days ago

Many banks in the Netherlands are affected by this.

tru3_power 1794 days ago

Any idea on cause? Ddos or hardware failure?

MrRadar 1794 days ago

Widespread issues like this on major CDNs tend to be configuration errors.

tootie 1794 days ago

Cloudflare seems to be struggling too. Not sure if they have some dependency on Akamai or if this portends something much worse

_joel 1794 days ago

So that's why the NHS website is down

jamespwilliams 1794 days ago

Back up now by the looks of it

https://www.apple.com/go/

Eikon 1794 days ago

This is affecting apple as well

iruoy 1794 days ago

For some reason that url doesn't work for me, but https://www.apple.com/ and https://www.apple.com/nl/ do.

remram 1794 days ago

That fails with a 404 for me, which is probably not related to DNS at all?

archive.org seems to indicate there was never anything there...

jdlyga 1794 days ago

Oops, someone unplugged the DNS machine

blondie9x 1794 days ago

Looks like it is fixed now!

bpye 1794 days ago

This is apparently why I can't book my COVID vaccine appointment...

_joel 1794 days ago

Yes, was trying to do the same. Getting this 2nd jab has been a nightmare. Places listed as walk-in having Moderna, don't and they ran out of it when I went to get my secheduled jab. Ringing 119 just ends up in a dead line, then this outage. Fun.

throwawaysha 1794 days ago

I ran DNS servers, among other things, in the late 90s with better uptime than these "multi-DC/AZ/geo redundant" services everyone uses these days.

With all due respect, having also run auth DNS servers in the 90s, and seen the inside of Akamai’s CDN/DNS setup more recently, it isn’t remotely at the same level of scale or sophistication.

throwawaysha 1793 days ago

"Scale and sophistication" scale relatively with time. Those servers we ran were relatively at the same level of scale and sophistication for their time. The only differentiator here is uptime, which has gotten worse as time has gone on. Five 9s used to be the standard. Three 9s seems to be the new standard.

fredski42 1794 days ago

I thought DNS was supposed to be resilient

topspin 1794 days ago

DNS is designed to be fault tolerant. Such a design, however, is often not leveraged correctly; the implementation of DNS can be and frequently is subject to SPOFs.

[0] https://news.ycombinator.com/item?id=27893482

rvz 1794 days ago

Probably Akamai needs to use Kubernetes.

EDIT: So HN can't even take a joke after this? [0]

whitepoplar 1794 days ago

Probably caused by Kubernetes

[0] https://news.ycombinator.com/item?id=27893482

rvz 1794 days ago

That's even worse if true; despite HNers creating a storm in a tea cup on DOSing a blog of a service not using K8s when having a blog is not their main service. [0].

Either way, the joke's is now on the HNers in that thread.

unemphysbro 1794 days ago

come on, this is funny. HN needs to lighten-up.

mdtancsa 1794 days ago

Sheesh, So yesterday! :)

simonswords82 1794 days ago

I'm sick and tired of these types of services (I'm looking at you too Cloudflare) going down and taking otherwise healthy websites down with them.

ceejayoz 1794 days ago

Most websites using Akamai aren't gonna be "otherwise healthy" without the CDN handling most of the load.

tootie 1794 days ago

It was fastly last time.

simonswords82 1794 days ago

True but cloudflare have been guilty of downtime too.

ceejayoz 1794 days ago

There aren't many sites that aren't, including "otherwise healthy websites" hosted without a CDN.

TheSwordsman 1794 days ago

I think this is a factually true statement if your business uses any computers. ;)

sammy2244 1794 days ago

Cloudflare hasnt had an outage in a long time. And when they do they are upfront about it, and post a detailed post-mortem.

gianpaj 1794 days ago

https://www.interactivebrokers.co.uk/ , a Trading Platform, is also down as well :(

How am I going to sell my AMC stock...

swarnie_ 1794 days ago

You don't, you hold the dumb, over priced stock as a reminder for future, better informed investing.