Hacker News new | ask | show | jobs
by doctorslimm 204 days ago
why is this not on huggingface as a dataset yet? is anyone poutine this on hugginggface?
2 comments

Maybe you skimmed past this from TFA:

"Well, the first problem I had, in order to do something like that, was to find an archive with Hacker News comments. Luckily there was one with apparently everything posted on HN from the start to 2023, for a huge 10GB of total data. You can find it here: https://huggingface.co/datasets/OpenPipe/hacker-news and, honestly, I’m not really sure how this was obtained, if using scarping or if HN makes this data public in some way."