Hacker News new | ask | show | jobs
by RealityVoid 1630 days ago
Huh, I wonder, can I download reddit? Like, all the text posts, ignoring images. I wonder how big of a db that is and how hard would it be to crawl it myself. It can't be more than a few gb of data. I mean, at this point there is a lot of information there that is just begging to be leveraged.
1 comments

Pushshift has a monthly comment[1] and submission data dump that you can download. Last June 2021's (comment) size was 20+ GB compressed in ZS.

[1]- https://files.pushshift.io/reddit/comments/