Hacker News new | ask | show | jobs
by CleanCoder 1106 days ago
Who owns the user-generated content? Would it be feasible to clone Reddit (the site) and populate it with content scraped directly from Reddit? One could potentially even claim the same username by verifying their ownership of it on Reddit. Of course it's not easy but a mass social migration might be more practical than mass segregation in hopes of something else to slowly gaining traction.
3 comments

>Would it be feasible to clone Reddit (the site) and populate it with content scraped directly from Reddit?

Lol, no; this is why I rarely worry about developers encroaching on operations concerns. A completely trustworthy site (https://backlinko.com/reddit-users#how-many-comments-are-pub...) states that that reddit had 303 million posts and 2 billion comments, in 2020. Could you imagine, how long it would take, and how much you would need to spend, on compute, to scrape 5+ million comments a day, using something like Selenium? I am guessing that it's a number approaching infinity. Plus, they would figure it out and just shut you down.

Interesting read (from HN today) about crawling a quarter billion webpages in 40 hours, for $580, over 10 years ago.
Reddit owns the content so you can’t do that.
Tell that to OpenAI...
I was about to ask this exact thing.