Hacker News new | ask | show | jobs
by antisthenes 1102 days ago
There are public data dumps of Reddit comments available all the way up to December 2022. And they're only roughly ~2TB all together.

There's nothing stopping AI companies from just using those instead of paying Reddit $50 million to scrape all of them using the API. It would also be 10x-100x quicker to do that rather than hammer their API for the comments (the API sucks for mass data retrieval)

2 comments

Sure, but companies doing that also wouldn’t be paying Reddit for that data.

The point of shredding comments isn’t to hurt the companies scraping the data (although that might be a nice side effect). Ultimately it’s to hurt Reddit.

Where would someone find these?
merci good fellow :)
lieto di aiutare l'amico