Hacker News new | ask | show | jobs
by minimaxir 854 days ago
It's still a bummer that Reddit had to kill all public data access completely in the process of locking down their data supposedly to prevent AI from scraping it. (but in reality, they did it in order to facilitate these deals)

The PushShift datasets were very useful for giving demos on how big data can be analyzed with more intuitive and interesting results. Here's an older blog post of mine analyzing Reddit data from 2018: https://minimaxir.com/2018/09/modeling-link-aggregators/

I would not be in data science/machine learning now if it weren't for Reddit data.

2 comments

The problem is its not their data, the aggregate everyone else's including their users. It's always bothered me how entitled they sounded about the API lockdown when it was very clearly not about protecting anyone's data.
> supposedly to prevent AI from scraping it. (but in reality, they did it in order to facilitate these deals)

I don't think that was ever a secret. If you say "I don't want AI to use my data", there's a pretty obvious "for free" implied in there.