Hacker News new | ask | show | jobs
by mbmjertan 1100 days ago
Given that sama is a board member of Reddit Inc, and that this is happening after GPT-4 was trained on Reddit data, I wouldn't jump to conclude they're upset at OpenAI.

SO had publicly available, no-auth-required data dumps. This makes it difficult for them to know who is using their data. However, this surely isn't the case for Reddit who offered only API endpoints for this content, and I'm guessing you couldn't use .json to get the whole site (rate limits, etc). I wouldn't be inclined to believe that Reddit would miss a new major API user.

This is purely speculation disregarding Hanlon's razor, but I'm thinking that the API pricing comes down to killing two birds with one stone.

* Sama got to train his LLM on Reddit and some best-of-the-internet content there such as r/bodyweightfitness, informed discussions on niche topics etc for free. The catch-up players face prohibitive pricing.

* Third-party apps get killed, bringing the UX to Reddit's control. I think this is more important to Reddit than ad revenue, as they could've simply built an SDK for probably less than this PR nightmare will cost us.

* Their new development platform, however, hints at the Reddit app supporting serving "redditor-made apps" which "can be seamlessly reused between communities".

The description (and the idea to have apps in your app) weirdly reminds me of WeChat apps, and given that Tencent is a major shareholder in Reddit, I would consider the possibility that apps are something they're pushing. That idea has no chance of success without the UX being completely in Reddit's hands, even then it's questionable how it would work on Reddit.

Spez couldn't actually be that detached from Reddit?

5 comments

While Reddit Inc didn’t provide data dumps, pushift.io published no-auth dumps of all site data back to 2008, which are still available on Academic Torrents.

Further, you could use the Reddit API to injest the full firehose of all site data in real time without violating rate limits. This is [one of the ways] how Pushift made their datasets.

Reddit was a lot more open that any other site approaching their size!

> and given that Tencent is a major shareholder in Reddit

Snoop Dogg owns more of reddit than Tencent. The Tencent investment is very overblown.

Snoop Dogg's investment was $50,000,000.
Indeed. And Tencent was at $150,000,000 three rounds later when the valuation was 7x.

Basic math days Snoop owns twice as much.

Oh shit. Never thought of it that way. Sam is playing real 4D chess here. I’d add that he is also guaranteeing (to some extent) the quality of his data source (reddit) by closing API access, because AI bots can’t easily ruin reddit data without API access.
> I think this is more important to Reddit than ad revenue, as they could've simply built an SDK for probably less than this PR nightmare will cost us.

Who is "us"?

Oh, that's a typo I didn't catch. Sorry for the confusion, I'm not a native English speaker so words slip once in a while.
Is there anything with the new pricing approach that prevents Reddit from giving OpenAI more favorable negotiated rates that are not publicly disclosed?