Hacker News new | ask | show | jobs
by fart-fart-FART 241 days ago
>Here’s where we push back. Reddit told the press we ignored them when they asked about licensing. Untrue. Whenever anyone asks us about content licensing, we explain that Perplexity, as an application-layer company, does not train AI models on content. Never has. So it is impossible for us to sign a license agreement to do so.

I wish they had told reddit to go fuck itself and taken that to court.

unlike the new york times lawsuit - where the platform owns their content and training is a gray area - reddit doesn't own shit. and if they insist otherwise - bye bye section 230 protections, no? they now retroactively own every post in r/jailbait and r/coontown.

1 comments

Without gating AI scraper access, Reddit’s enterprise value based on only ad revenue is greatly diminished. If the AI folks impair Reddit’s economics through their maneuvers, that might not be so bad (as Reddit’s behavior of late has been “all this user generated content belongs to us to monetize as we see fit”).
The AI companies could just pull the content from Reddit mirrors like https://arctic-shift.photon-reddit.com/search/ and https://search.pullpush.io/. It's not difficult to scrape nor difficult to acquire archives of all Reddit posts and comments.
They would most likely use the browsers they offer users to scrap and stream the content back to an endpoint for ingest and processing as users browse Reddit, think Recap the Law extension for Pacer (which scrapes Pacer while a user browses it and ships the data to the Internet Archive) or ArchiveTeam’s Warrior VM. You can’t defend against scraping when every user browser, that looks like a human because it is a human, is a crawler node.

At least, this is how I would engineer a public browser operating as an adversarial distributed crawler network.