I am convinced that Reddit's status page underreports outages, there are a lot of times I can't access the site for 15min but it never shows up on here. (And I don't even visit it that often.)
I think they're the major company with the fewest 9s of uptime.
> I think they're the major company with the fewest 9s of uptime.
I think this possibly shows that they have a good understanding of what's most important to their users. If they were to ensure the application was highly available (think six 9s), then they would be sacrificing resources elsewhere in order to achieve this. They could recognize that an outage will not be a "life or death" event for their users and have decided the trade-off was worth it.
Counterpoint: Or, they could just be lazy (or incapable) and haven't tried to solve for higher availability. This is not likely.
> I am convinced that Reddit's status page underreports outages
If they are like every other major company out there, status page reports are a function of a manager or PM judgement call, not the output of a monitoring system.
Does anyone know why Reddit seems to be alone among the world’s most popular sites in having so many outages and frequent periods of slowness? I’ve been on it for more than a decade and its unreliability has been fairly constant. Is the engineering team more resource constrained than others or held back by unusual amounts of legacy code, or is it simply confirmation bias resulting from earlier experiences?
"I can answer this! According to various sources (mainly, complaints on the redesign subreddit), the new.reddit interface is a lot less tolerant of timeouts and delays. Old.reddit was much more relaxed and would wait longer before throwing up errors (like logouts and such), but new.reddit is RESPONSIVE!, and the backend isn't playing nice with the new, tighter tolerances."
Even if it were truely the reason (it's just someone uninvolved who saw a lot of complaints about the new interface) it doesn't explain the rest of the decade where there was no "new interface" and it was still a meme that Reddit was constantly down.
Is that the way 'responsive' is being used in this particular context? What does reflowing a webpage when the browser's window changes size have to do with these timeouts and delays in the backend?
theres some ironic use of the word in the post above. they are mocking that the new and improved "responsive web" site isnt responsive in a speed context.
To me, it feels like outages are much less frequent than they were 7-10 years ago. It used to be a couple times a month. I don't remember the last time I saw one of their snoo error pages.
Outages are definitely less common than they used to be. But I would guess that reddit is one of the most popular sites on the internet because it's basically like an anonymous(ish) facebook with content guaranteed to be popular with the majority (by design).
They've also made major efforts to keep the site socially sane for the majority of users (for better or worse, IMO generally better) with their moderation model.
Whether or not it's a great site technically is irrelevant. It's something people want and something I keep coming back to because there's no equivalent when it comes to niche communities.
>keep coming back to because there's no equivalent when it comes to niche communities.
Niche specific forums are better at everything niche communities want/need in a discussion space except bringing on new users. Considering the kinds of users you get when you're in close proximity to a colossus of internet riff-raff I don't think this is a big tradeoff.
There was a AMA from the engineering team. I don't remember the technical details but in the picture there was like max 10 persons, That seem a little understaffed for a website a this scale
Anecdotal, but old.reddit.com (besides search) seems to always work during these outages. I'm guessing they're always knocking out the AJAX and search machines for some reason
I gave it some thought. I’ve distilled the reddit value proposition for me at least.
Imagine having to manage a separate login for a different forum for every topic you’re interested in. And having to check 10+ forums a day for responses to your comments.
With reddit it’s a common interface to every topic I’m interested in. I’m subscribed to about 60 subreddits now, from off grid cabins to deep learning to Spacex.
>Imagine having to manage a separate login for a different forum for every topic you’re interested in. And having to check 10+ forums a day for responses to your comments.
Bookmarks folder + Browser password manager + email notifications solves this for N forums where N is less than ~20.
Sure, it's harder to have a really shallow level of engagement in a community if it's not all algorithmicly curated on one page (you actually have to go to the forums and check what's up vs seeing just the popular stuff in front of you) but I think for most interests that's probably a good thing for the community around that interest.
Good find, although that would assume that every hour of the year is equally contributing to revenue. In reality I would think that an hour-long outage in the morning as everyone in N. America (still the largest percentage of reddit users) is reading reddit while waking up/drinking coffee is probably more costly than an hour-long outage while most of those users are asleep.
Looking forward to this postmortem!
Looks like their request rate has dropped by like 80%-90%!!
Don't know if that include CDN requests or what, assuming it's just the reddit.com domain.
I believe comments, posts, and votes are placed into a queue and later (seconds usually) committed to the database. They're not visible to the public until they're committed (but they make the UI act like it's already there for the poster themselves). Just a guess from my observations though.
I think they're the major company with the fewest 9s of uptime.