Hacker News new | ask | show | jobs
by hex12648430 3541 days ago
If you are a researcher looking for a 4chan dataset spanning a much longer time period and many different boards, I would recommend looking into the archive.moe database dump[0] that was uploaded to the Internet Archive after the owner decided to stop his activities.

[0]: https://archive.org/download/archive-moe-database-201506

1 comments

I was recently working on a project to identify pepes in images and dumped a bunch of images from the /r9k/ archive using the 4chan API.

If you are a researcher looking for a 4chan dataset for supervised learning, be aware that labeling the images is not a task for the faint of heart.

When you browse an image board, you will have an implicit image filter: you will probably only open threads that interest you and that have been alive for a while. That does not hold for an image dump of all threads.