| There are well over a million subreddits. Even counting just large groups, there likely several thousand subreddits which have individually-specific focuses and moderation criteria. Reddit reports 100+ million active subreddits.[1] The two problems with moving to an in-house, wage-labour moderation team are that this is expensive and wage labour at prices Reddit is likely willing to pay will not meet the standards of dedicated volunteer teams. From various sources I've encountered over the years, human-based moderation peaks at somewhere between 500--1,000 items/day (multiple sources put a peak at about 700--800, though that's with very thin review). Reddit ... doesn't seem to offer stats on daily / monthly comment volume, though it claims ~60m DAU and 13 billion posts and comments overall. I'm going to SWAG[2] and assume roughly half of those have occurred in the past five years, which would mean that there are ... about 3,300 posts / comments day. Which seems low, so my SWAG's probably wrong. If 13 billion items are posted per year, then there are ~35 million items posted per day. That seems possibly high, though Facebook's claim is 5 billion items/day, so ... maybe? shrug One criteria I've suggested for moderation elsewhere is based on prevalence, which is the number of times an item is viewed. Short version: prevalence follows a power-law distribution, and as the views threshold is raised, the number of items falls off drastically. With some tuning and adjustments (e.g., risk-rating comments to raise or lower estimated harms), it's possible for a finite moderation team to offer an SLA[3] that content with a given prevalence threshold will be reviewed. It's also possible to set holds such that content reaching that threshold is withheld from further visibility until it is reviewed (say, if some specific item starts taking off), which effectively throttles visibility of content and scales it to the limited moderation resource. (I'm not aware of any UGC[1] service applying this model to moderation, but it is one which strongly suggests itself. It is effectively what a gate-kept editorial model applies, e.g., where an editor specifically reviews all incoming entries from a "slush pile"[5].) Going back to my content numbers above, a 35 million items/day content stream and a moderation team capable of reviewing 500 items/day (roughly 1 minute per item on average) ... would requite a 70,000 member moderation team, which is likely prohibitive for Reddit.[6] A prevalence set such that 10% of all items require human review reduces that to 7,000, still likely high, and a 1% review which would still cover the overwhelming majority of all content presentations) a somewhat more tractable 700. From third party and my own sources there's a roughly inverse relationship between content items* and prevalence, such that increasing prevalence 10x reduces the number of individual content items by a factor of 10. For reference, looking at Hacker News historical front pages and votes and comments of the 1st and 30th ranked stories, we see about 6.3x more votes, and 3.8x more comments on the 1st-ranked story.[7] For Google+, a near-logrithmic scaling of number of communities vs. size was noted.[8] ________________________________ Notes: 1. As of January 2021: <https://www.redditinc.com/> 2. <https://en.wikipedia.org/wiki/Scientific_wild-ass_guess> 3. Service level agreement, basically a guaranteed minimum service level. 4. User-generated content. 5. <https://www.writersdigest.com/getting-published/what-is-the-...> 6. The somewhat better-capitalised Facebook is reported to have 7,500 moderators: <https://www.vice.com/en/article/xwk9zd/how-facebook-content-...> 7. Own data based on a crawl of all HN front page "past" listings from 2007-2-20 through 2023-6-13, with 178,642 stories. 8. Own data based on a crawl of all extant ~8.1 million Google+ Communities, data provided by Friends+Me creator. The data actually show far fewer large communities than a strictly log-log relation would suggest, for reasons that are unclear. See: <https://diaspora.glasswings.com/posts/ab6a5470f57001368d4002...> |
A discussion elsewhere had me looking at how much front-page HN activity is attributable to what number of profiles. Using my crawl data mentioned above:
That very nearly perfectly follows the rule I'd given above: reducing the items by a factor of ten (here: number of front-page posts) increases the submitters by about a factor of 10 (roughly: 2, 200, 2,000, 20,000).Half of all HN front-page stories since 2007 were submitted by just 2,092 profiles, of 43,598 represented in all front-page stories. As of 2021, Whaly.io found 767,496 active profiles since 2005: <https://whaly.io/posts/hacker-news-2021-retrospective>. (Post or comment activity.)