Hacker News new | ask | show | jobs
by chipperyman573 3045 days ago
You wouldn't even need to mine it - https://github.com/HackerNews/API
3 comments

Even easier: All Hacker News posts are ready to be analyzed in BigQuery.

- https://medium.com/@hoffa/hacker-news-on-bigquery-now-with-d...

Sorry, I meant analyze. I actually am writing a blog post right now on this (this thread was very inspiring). Should be up in a day or two at https://applecrazy.github.io/blog
Good luck downloading 16 million records...
Why? 16 million isn’t much, maybe 10 to 20 gigabyte if compressed.

I’ve seen IRC bouncers that have more messages stored for a single user (270 million, in fact).

You could do that with the Algolia API: https://github.com/minimaxir/get-all-hacker-news-submissions...

But as noted, BigQuery is more pragmatic.

I have a script running on aws right now.