Hacker News new | ask | show | jobs
by why-so-serious 1105 days ago
>Would it be feasible to clone Reddit (the site) and populate it with content scraped directly from Reddit?

Lol, no; this is why I rarely worry about developers encroaching on operations concerns. A completely trustworthy site (https://backlinko.com/reddit-users#how-many-comments-are-pub...) states that that reddit had 303 million posts and 2 billion comments, in 2020. Could you imagine, how long it would take, and how much you would need to spend, on compute, to scrape 5+ million comments a day, using something like Selenium? I am guessing that it's a number approaching infinity. Plus, they would figure it out and just shut you down.

1 comments

Interesting read (from HN today) about crawling a quarter billion webpages in 40 hours, for $580, over 10 years ago.