| Given that sama is a board member of Reddit Inc, and that this is happening after GPT-4 was trained on Reddit data, I wouldn't jump to conclude they're upset at OpenAI. SO had publicly available, no-auth-required data dumps. This makes it difficult for them to know who is using their data. However, this surely isn't the case for Reddit who offered only API endpoints for this content, and I'm guessing you couldn't use .json to get the whole site (rate limits, etc). I wouldn't be inclined to believe that Reddit would miss a new major API user. This is purely speculation disregarding Hanlon's razor, but I'm thinking that the API pricing comes down to killing two birds with one stone. * Sama got to train his LLM on Reddit and some best-of-the-internet content there such as r/bodyweightfitness, informed discussions on niche topics etc for free. The catch-up players face prohibitive pricing. * Third-party apps get killed, bringing the UX to Reddit's control. I think this is more important to Reddit than ad revenue, as they could've simply built an SDK for probably less than this PR nightmare will cost us. * Their new development platform, however, hints at the Reddit app supporting serving "redditor-made apps" which "can be seamlessly reused between communities". The description (and the idea to have apps in your app) weirdly reminds me of WeChat apps, and given that Tencent is a major shareholder in Reddit, I would consider the possibility that apps are something they're pushing. That idea has no chance of success without the UX being completely in Reddit's hands, even then it's questionable how it would work on Reddit. Spez couldn't actually be that detached from Reddit? |
Further, you could use the Reddit API to injest the full firehose of all site data in real time without violating rate limits. This is [one of the ways] how Pushift made their datasets.
Reddit was a lot more open that any other site approaching their size!