Hacker News new | ask | show | jobs
by kickling 1893 days ago
Love the status page right now. https://developers.facebook.com/status/dashboard/

PS. Instagram app/site, whatsapp.com, messenger.com, oculus.com also down, anything else?

UPDATE: Everything back up! (Even the status page)

8 comments

Finally a status page that updates the moment an issue starts
for posterity:

  <?xml version="1.0"?>
  <!DOCTYPE html>
  <html lang="en" id="facebook">
    <head>
      <title>Error</title>
      <meta charset="utf-8"/>
      <meta http-equiv="Cache-Control" content="no-cache"/>
      <meta name="robots" content="noindex,nofollow"/>
      <style>
        html, body { color: #333; font-family: 'Lucida Grande', 'Tahoma', 'Verdana', 'Arial', sans-serif; margin: 0; padding: 0; text-align: center;}
        #header { height: 30px; padding-bottom: 10px; padding-top: 10px; text-align: center;}
        #icon { width: 30px;}
        .core { margin: auto; padding: 1em 0; text-align: left; width: 904px;}
        h1 { font-size: 18px;}
        p { font-size: 13px;}
        .footer { border-top: 1px solid #ddd; color: #777; float: left; font-size: 11px; padding: 5px 8px 6px 0; width: 904px;}
      </style>
    </head>
    <body>
      <div id="header">
        <a href="//www.facebook.com/">
          <img id="icon" src="//static.facebook.com/images/logos/facebook_2x.png"/>
        </a>
      </div>
      <div class="core">
        <h1>Sorry, something went wrong.</h1>
        <p>We're working on getting this fixed as soon as we can.</p>
        <p>
          <a id="back" href="//www.facebook.com/">Go Back</a>
        </p>
        <div class="footer"> Facebook &#xA9; 2021 &#xB7; <a href="//www.facebook.com/help/">Help</a></div>
      </div>
      <script>
        document.getElementById("back").onclick = function() {
          if (history.length > 1) {
            history.back();
            return false;
          }
        };
      </script>
    </body>
  </html>
And the png does not load.
And here are the headers:

    # curl -i https://www.facebook.com/
    HTTP/1.1 500 Internal Server Error
    X-Frame-Options: DENY
    X-XSS-Protection: 0
    X-Content-Type-Options: nosniff
    Strict-Transport-Security: max-age=15552000; preload
    Set-Cookie: ...
    Expires: Sat, 01 Jan 2000 00:00:00 GMT
    Cache-Control: private, no-cache, no-store, must-revalidate
    Vary: Accept-Encoding
    Pragma: no-cache
    x-fb-rlafr: 0
    Content-Type: text/html; charset="utf-8"
    X-FB-Debug: ...
    Date: Thu, 08 Apr 2021 21:41:49 GMT
    Priority: u=3,i
    Transfer-Encoding: chunked
    Alt-Svc: h3-29=":443"; ma=3600,h3-27=":443"; ma=3600
    Connection: keep-alive

    <!DOCTYPE html><html lang="en" id="facebook"><head><title>Error</title>...
>>Expires: Sat, 01 Jan 2000 00:00:00 GMT

The year 2000 doesn’t sound right.

This is common, it essentially means “don’t cache this”.
Ok, I thought this was related to SSL certs and maybe wrong dates might have caused FB to go haywire.
Seriously, they host the status page on their own infrastructure? That's ... not smart.
Every company's actual status page (and news feed) is on Twitter, no matter what else they say. I don't know where Twitter hosts their real status page. Probably Facebook.

Notably, Facebook hasn't updated theirs, though.

> every company's actual status page is on Twitter

Clearly you have not experienced the absolute joy of a mission-critical locked-in-vendor B2B SaaS with no status page, no active Twitter presence, and the effective status page being a chat widget that routes to a person overseas who says "oh yeah this is a known outage on our main product please stay tuned" but there's no attempt to proactively make it visible to clients that they know there is an outage.

EDIT: fun fact, said vendor lets us use a subdomain we own, so we just route them through a Cloudflare Worker that injects Sentry into their HTML, so that we can monitor errors ourselves and raise tickets with them pretending that we know less than we actually know, because somehow we have a better observability culture than a SaaS vendor that's been around for 20 years. Don't underestimate the difference between vendor.mybrand.com and mybrand.vendor.com on a feature matrix, it may save your sanity.

During the recent Microsoft Azure AD outage, I attempted to use the chat widget. Selecting "Technical support" or "Billing Support" both seemed to return 500 status codes while the "Sales Support" routed somewhere else and connected me to a person, perfectly fine. Presumably the sales team don't use Azure for anything.
Circa 2008-2009 when I was at FB I wrote a dashboard widget for the ops team that scraped Twitter for mentions of phrases like "Facebook down" in the past 5m. It was in use for a while.
> I don't know where Twitter hosts their real status page

What are the odds it's some IRC bot running under someone's desk?

Seems like Giphy is up at least, also facebook owned.
Anecdotally, whatsapp worked for me (message was sent and received successfully) though Facebook gave an error. I was already logged into both services.
whatsapp.com as in the web-based client, which is down.
It loads fine for me and shows no downtime or incidents. It even says "Facebook Platform is Healthy"
Yup, everything just came back online.
Now I am unable to connect at all:

    ~# wget https://developers.facebook.com/status/dashboard/
    Connecting to developers.facebook.com (157.240.206.16:443)
    wget: can't connect to remote host (157.240.206.16): Operation timed out
Try posting
Not here. Status page still does not load at all.
same, and can't visit the site at all
Do they share infrastructure?
Given the fact that it's also down, I'd say yes
It could be some rouge admin who actually deleted Facebook.
At that size of a company there's usually blast radius restrictions and per-role permissions. I don't expect anyone has enough rights to "delete Facebook" on their own.
I guarantee you that there are 100+ people who could take Facebook down for 24+ hours if they went rogue.

For example the people responsible for the bootup scripts of Facebook infra could sneak in a "0 0 1 * * /bin/rm -rf ${TEMPDIR}/*" into crontab... They'd set the commit message as "clear out temp monthly" and it would get deployed across the entire fleet till in the first of next month every disk at Facebook gets erased because TEMPDIR isn't defined...

I guess they have enough pending stock to deter them...

This wouldn't "delete" Facebook or many much smaller companies. It would result in maybe a small outage and get restored immediately in most cases. It's also an infra change you'd need across many systems - this isn't possible as a single change "across entire fleet".

This is not how non-trivial services work.

That’s...now how any of this works. You can’t just change integrity-bearing things without FIM systems kicking in. And you’d need collusion to get something mainlined that would bypass that.
How though? Every past author of that script would be notified of such a change. It'd be insane if all of them would pretend they didn't see it and accept that change.
Facebook servers do not have cron installed.
one could hope
Like a moulin rouge admin?
You got me there :D
Time for them to hit the gym and call a lawyer!
Yeah, that's what I was wondering. Strange that there is a SPOF for Facebook/Insta/Whatsapp.
I had the impression that they are still run like mostly independent companies who only share some data on the backend.
AFAIK, I remember reading something about Instagram moving into Facebook datacentres some time ago. I believe they were on AWS before the acquisition.
Yeah that's what I was thinking too, so this is probably DNS related.
Might I ask what’s SPOF?
Single Point of Failure.
Thanks
Single Point Of Failure
Ta
Even the status page is down
I like that it doesn't know that anything went wrong.