| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CleanCoder 1106 days ago
	Who owns the user-generated content? Would it be feasible to clone Reddit (the site) and populate it with content scraped directly from Reddit? One could potentially even claim the same username by verifying their ownership of it on Reddit. Of course it's not easy but a mass social migration might be more practical than mass segregation in hopes of something else to slowly gaining traction.

3 comments

why-so-serious 1105 days ago

>Would it be feasible to clone Reddit (the site) and populate it with content scraped directly from Reddit?

Lol, no; this is why I rarely worry about developers encroaching on operations concerns. A completely trustworthy site (https://backlinko.com/reddit-users#how-many-comments-are-pub...) states that that reddit had 303 million posts and 2 billion comments, in 2020. Could you imagine, how long it would take, and how much you would need to spend, on compute, to scrape 5+ million comments a day, using something like Selenium? I am guessing that it's a number approaching infinity. Plus, they would figure it out and just shut you down.

CleanCoder 1105 days ago

Interesting read (from HN today) about crawling a quarter billion webpages in 40 hours, for $580, over 10 years ago.

sjckciodjcr 1106 days ago

Reddit owns the content so you can’t do that.

islon 1106 days ago

Tell that to OpenAI...

hidden80 1106 days ago

I was about to ask this exact thing.