| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by toomuchtodo 233 days ago
	Without gating AI scraper access, Reddit’s enterprise value based on only ad revenue is greatly diminished. If the AI folks impair Reddit’s economics through their maneuvers, that might not be so bad (as Reddit’s behavior of late has been “all this user generated content belongs to us to monetize as we see fit”).

1 comments

sergenj 233 days ago

The AI companies could just pull the content from Reddit mirrors like https://arctic-shift.photon-reddit.com/search/ and https://search.pullpush.io/. It's not difficult to scrape nor difficult to acquire archives of all Reddit posts and comments.

toomuchtodo 233 days ago

They would most likely use the browsers they offer users to scrap and stream the content back to an endpoint for ingest and processing as users browse Reddit, think Recap the Law extension for Pacer (which scrapes Pacer while a user browses it and ships the data to the Internet Archive) or ArchiveTeam’s Warrior VM. You can’t defend against scraping when every user browser, that looks like a human because it is a human, is a crawler node.

At least, this is how I would engineer a public browser operating as an adversarial distributed crawler network.