Hacker News new | ask | show | jobs
by quatrefoil 823 days ago
The API situation seemed baffling to me at the time. The timing wasn't coincidental and it was clear that they were responding to people training LLMs on the Reddit corpus.

But here's the thing: prevailing HN sentiments notwithstanding, your average Redditor leans left and is fairly anti-big-tech, so Reddit could have leveraged this angle. They could've said it's a pro-user move to stop OpenAI and the likes from unfairly profiting off your work. And most users would have applauded.

But Reddit didn't say that. They took a PR hit and decided to wait it out. The cynical explanation was that they were actually just trying to get some of that LLM money for themselves. And not long ago, they announced a big deal with Google to give access to user data for training purposes: https://www.reuters.com/technology/reddit-ai-content-licensi...

Frankly, I was on the fence about the API access thing until the motivation became clear.

2 comments

This doesn't track. They spent a lot of time and energy coordinating with the authors of popular mobile clients, and could easily have extended some means of letting them continue to operate given that they were clearly not harvesting content for themselves. Meanwhile, content can still be harvested for LLM training without the API (by using the HTML site).

It seems like the real intent was to regain control over the surfaces users use to consume the site, especially on mobile.

Scraping is a lot more dicey than using an official API. Why did Google enter that partnership? They have the data in their index. The only conceivable reason is that they prefer to pay Reddit to avoid the risk of litigating it and ending up with some unfavorable precedent.
There's more than just the data you see remember, the data you don't see is also valuable. The DM's, the deleted posts, advertisingID's that link people to their accounts, and to their alt accounts, etc.
Avoid litigation and possibly of getting some injunction. And on other hand money can go to fund Reddit litigating others. As now they have proven someone paid so they could stop others using data. Slowing them down in process. And the sum is peanuts for Google, they waste that amount regularly...
Scraping and an official API are equally "dicey" in that the method isn't relevant to the question of how the content itself is licensed to be used.
> The timing wasn't coincidental and it was clear that they were responding to people training LLMs on the Reddit corpus.

Hard disagree, they were lying about their motives, plain and simple. Their claimed motiviation doesn't match what they did/didn't actually do.

For example, the biggest companies training LLMs could be dissuaded by simply changing the terms-of-service to prohibit that usage (skipping all that developer-labor and community protest) but Reddit didn't do that. (Dodgy companies that don't care aren't relevant since they'll just scrape the website even without API access.)

In contrast, "Reddit arbitrarily killed third-party apps to force their own app" does match what the company implemented: Abrupt and punishing new fees, mandating that third-party apps can't contain any ads of their own, and making certain categories of content exclusive to their own app.