Hacker News new | ask | show | jobs
by citizenpaul 1475 days ago
>Facebook for a day due to a bad config is going to cost more money than a century of someone writing code.

I simply don't believe this is true. Its one of those things that has become "fact" through repetition. I don't think there is even a way it could be proven if you wanted to. I don't think amazon loses $X millions if the site is down for an hour. I think people just say oh its down and come back later and buy exactly what they were going to anyway.

Perhaps they have to pay out some SLA type stuff to advertisers. However I don't think there is a single outage of a major tech company that has lasted 24hrs in the last 20 years. Major tech company is of course debatable.

4 comments

The reason they aren’t down more is because they have systems and processes to keep them up. It generally requires multiple failures for them to go down in any meaningful way.

That being said if your service went down for 24 hours because of a bad code change, it probably means and your service goes down a lot. Because whatever caused that 24hr outage probably points to a systemic lack of process and discipline within the organization.

Facebook was down for 6 hours last year during the day (in the western world) so that was very impactful most likely. And as wonderwonder said advertising is an eyeballs game and if no one is on the site then they're not seeing/clicking/attributing to ads. Any non-monopoly would have a decent impact from downtime as customers either spend their time on other sites or buy more time sensitive things on other sites.

edit: There's also a lot of revenue impacting things short of downtime especially given the automated and ML heavy paths for most of these sites. A few slightly broken features in a key model can cause a 1% drop in your revenue which for facebook is over a billion a year.

Facebook does not make money selling physical items though, it makes money on page views and clicks. If the site is down, they lose all of those ad views and clicks, they are gone because the time needed to view them and all of the other ads is gone.
I have not worked in the ad space so I'm certainly ignorant of the day to day mechanics. How can you prove that the people didn't just come back 6 hrs later and spend the same time clicking and viewing the same ads? Just in a different timeframe? Is the cost of an ad at 6PM really that much greater than at 10PM?

IME SLA's rarely pay out enough to really make it worth pursuing. However I have gotten a couple payouts from SLAs for companies i worked at in the past. I didn't think it was worth the time but I got paid either way.

I get they have profit models that "prove" they lost money but they all must be based on an assumption. The assumption that there was only this one chance to catch the attention of the customer and never again. To me this is a very big assumption that simply cannot be proven and is probably wrong.

>How can you prove that the people didn't just come back 6 hrs later and spend the same time clicking and viewing the same ads?

Why do you assume intelligent people who make decisions based on this haven't tracked spend by week, spend by month, regression models and the five hundred other ways to try and model this? We're talking about trillion dollar industries and you're assuming they don't do basic sanity checking.

I did start out saying I simply do not believe them. They don't have data from the time they were down, presumably. That is a big hole in the data for one instance.

>assuming they don't do basic sanity checking.

I'm sure they have a million ways to say its not my fault. I started writing a huge response about human nature and the incentive to massage stats but I think you will dismiss it. If a system is down its a great opportunity to get a blame hedge in case sales are not as good as you predicted. There is no incentive to take an honest look if a brief downtime really affected sales in a meaningful way. I doubt anyone even does look. They just say this it what it would have been in our models if the downtime did not happen.

dunning-kruger
Total Time Spent, Total Post Impressions, Total DAP are both easy to measure and relatively predictable week over week, so its fairly easy to see regressions. Outages definitely impact consumer apps, which makes sense if you stop to consider that a large chunk of time spent on platform comes during the downtime between other activities. Probably less so for commerce apps, especially ones that sell staple goods.
Think the extreme case - what if the site is down for a whole month, a whole day? Then you can understand.

Another way: you have nothing else to do at the time. so you might click the ads. But now the site is down, there is no ads for you to click.

I mean there was an article yesterday about people not buying/playing GTA5 because of a 5 minute load time.

Commercial web sites lose money when they are down. This is obvious to everyone.

For sure. My issue was that I think its way overestimated. They have every incentive from the sales side to pump up that number as much as possible.
Atlassian?