Hacker News new | ask | show | jobs
by Capricorn2481 1102 days ago
I have not read a reddit devblog or anything like that. My guess is they are doing some kind of caching of posts via elasticsearch/redis or something similar. It must depend on the privacy settings of a sub.

So when 1000 subs go private, they potentially had to update hundreds of millions of comments.

Anyone else have a theory?

12 comments

Posted elsewhere, but yeah that's my theory too (having actually helped built it in the first place). My guess is the caches got blown out because the entire system is built on the idea that most people are looking at the most popular content.

When you close down all the most popular content, you have to dig deep into the long tail for fresh content.

Also, my guess is that the code for building homepages is not optimized for having a lot of skips due to private Reddits, since most people have probably never been subscribed to a private reddit, or if they have it wasn't for very long, or even if for very long, never more than one or two at at time.

Makes sense to me. I remember several lunch conversations at FB about whether or not we even could restart the site with totally cold caches. I’m sure it was possible but it wouldn’t have been clean or pretty.
We did restart reddit with cold caches a few times when I was there. It took hours to recover but eventually caught up if it happened near peak. If we did it during low traffic it wasn't too bad.

But that was 13 years ago, no idea how it would do today. :)

I’m sure Facebook could too (their disaster recovery planning and drills were always top notch) but I’m glad I won’t be oncall if it ever happens :-D
> I remember several lunch conversations at FB about whether or not we even could restart the site with totally cold caches

so you're saying, we just have to knock it out once, and it'll be gone forever? that makes things easier ...

That makes sense to me. It would be one variant of a larger theory of "a subreddit going private is an expensive process that is poorly optimized because it's normally so rare". Another variant would be that clicking the "go private" button might do something like start a huge cascade of queries/calls to mutate the status of individual posts and comments (and possibly hence also trigger the massive cache invalidation you mentioned). Possibly also "going private" infra, whatever it is, is some bespoke snowflake that doesn't autoscale (since it's never needed to and there was always something better to be doing with engineering time, or people just forgot).

One theory I have seen floated but which seems unlikely to me is that this was some kind of internal sabotage. I could maybe buy this if the protest was about a war or human rights issue or something, but I really don't think any IT pros would be willing to risk jail time for this one.

Would a better protest be rapidly switching subreddits from public/private?
A better protest would have been for the moderators to all stop moderating. Turn off their automod filters, allow all posts. Show reddit what happens when their volunteer moderators just stop.
I still think this is coming. After the 48hr strike, how many mods are going to turn their sub back to public, look at their mod inbox filled with confused and angry users, and then see some pending non-response from Reddit post-strike, and wonder why they do this?
The problem with this strategy is that it gives reddit a valid excuse to remove subreddit mods.

Having your sub be unmoderated is against reddit TOS, whereas taking a sub private is not yet against the official TOS.

Process and technicalities like this matter a lot and can effect a platforms actions.

I agree that taking the subs private for 48 hours is a good first step. And I agree that subs staying private indefinitely is a good next step.

But if neither make headway, just going unmoderated is the logical next step. Yeah, reddit could take over moderating the subreddits. They can also do that if they're set private. But they don't want to. They outsourced that to volunteer moderators so they could get free labor. If they want to run their own moderation, and submit their own content, the reddit corporation can have fun with that.

You never want to threaten people with a good time.
From Reddit corporate's perspective, all of their free-labor mods disappearing is _not_ "a good time".

Really shouldn't be anyone's idea of a good time.

every sub would then be nothing but onlyfans ads or similar spam
Too easy to shut down, since there's a finite number of mod enabled accounts.
> to mutate the status of individual posts and comment

AKA why you should normalize your data models.

Meh it's a tradeoff usually. De-normalize for better read performance at the expense of some complexity and worse update performance/semantics.
This is true, but FWIW (it probably wouldn't save you here, too much is changing too fast) something like a SQL materialized view can very often give you both the chocolate and the peanut butter.
I'm skeptical. Most of the outages I've been involved in have been for way dumber reasons than that. Based on my dumb experience, my guess is that the handling for a user trying to access a private subreddit stressed some system that doesn't make sense to even exist.
My theory is that the algorithm that produces the front page relies on the largest subreddits all being public. But if it can't draw on posts from the largest subs, it'll have to dip into a bunch of smaller subs, and the part of the algorithm where it delves deeper and deeper into less popular subreddits is terribly optimized and puts a huge strain on the servers.
Cache invalidation makes sense.

Could also be some kind of fanout query that assumes the top subreddits are open and wastes CPU cycles.

It could even be that lots of queries to generate pages just fetch private posts and comments. The assumption was that users will not be subscribed to lots of subreddits that they have access to so skipping these posts is typically negligible and not worth using a more complex query for. If the fraction of skipped items becomes significant these queries will be doing lots more work. Its pretty believable that some very common query getting 25-50% more expensive will be more server capacity then Reddit has available.

Another possible explanation is that operations on private subreddits are not as optimized as it is expected that they have a small number of subscribers and take up a small amount of total resources. These assumptions would be flipped.

I’ve noticed that my comment history in privatized subreddits are now unavailable, so I wonder if it’s related to that.
the cynic in me says, they're being DDOSd (I mean, they pissed off millions of internet nerds who have their own definition of justice), and instead of admitting it they're using the opportunity to blame the subreddit mods.

then again, not sure why that messaging would be any better for them than admitting to a DDOS...

Blame the user makes sense because they don't want to show how fragile Reddit is while they're trying to attract investors for the IPO.
Here's a guess from a former reddit dev.

> (I used to work as a backend developer at Reddit - I left 6 years ago but I doubt the way things work has changed much)

> I think it's extremely unlikely that this is deliberate. The way that Reddit builds "mixed" subreddit listings (where you see posts from multiple subreddits, like users' front pages) is inefficient and strange, and relies heavily on multiple layers of caches. Having so many subreddits private with their posts inaccessible has never happened before, and is probably causing a bunch of issues with this process

https://tildes.net/~tech/163e/reddit_appears_to_be_down_duri...

Unrelated, but any way to get a tildes invite these days? I asked on Reddit but then shut my sub down and have avoided going back
Exact same boat here.
That was my thinking as well, and assuming they're still cloud-heavy (it's been years since I've checked) this could potentially be an expensive event for them. That opens up a very interesting for of protest/attack that reddit really won't like; "privating" then "un-privating" a subreddit several times a day.
Twitter was like this at one point, flipping over from public to private was one of the most resource intensive things an individual user could do... especially with a long tweet history, because it involves changing every tweet.
My theory is that some middleware applications have policies to scale to zero or one based on traffic patterns, when scaled down due to lack of traffic things start breaking.
The subreddits that are most popular are on high performance servers while the subreddits that are less popular are on lower performance servers?
Well that is how wallstreetbets freezes the subreddit to log all of the potentially actionable comments.