Hacker News new | ask | show | jobs
by jakabia 1098 days ago
The whole reddit (posts and comments separately) from 2005-06 until 2022-12 is on this [1] torrent link, it's very easy to download, extract and use the data [2]. I'm writing my thesis about the connection between the reddit post's type and the comment structure, and I've been working with this data, for a few months, it's amazing.

[1] https://academictorrents.com/details/7c0645c94321311bb05bd87...

[2] https://github.com/Watchful1/PushshiftDumps

1 comments

Does possessing a copy of this dataset open you up to Subject Access Requests (and other legal jurisdictions' equivalents)?
I don't know exactly how they work, but GDPR has some dispensations when data is used for academic purposes.