Hacker News new | ask | show | jobs
Ask HN: Can I store a segment of Reddit pages every minute?
1 points by Ian999 3552 days ago
I have some experience programming but want to work on a side project that would require me to have Reddit pages stored every minute. I would probably just work with the top subreddits to start.

I basically want the titles, # upvotes, # of commments stored every minute. I want to do this on new, rising, controversial, and hot.

I've done some research and I can only find people have pulled the data at one point in time vs continuously which I need.

Questions: 1. Do any of you know if this has been done somewhere (don't want to reinvent the wheel)?

2. If I were to do this, what languages/resources should I use? (I will be interacting with their API)

3. I would be willing to spend a couple thousand dollars to store the data on the cloud (I have no sense of how much it will cost). I figure if I build and run for one day, I can get a sense but if anyone has any sense before I do, if this will cost way more than that, please let me know.

4. Any other advice is welcome.

Even though I would be willing to spend that money on a server, I don't want to pay someone to do this for me, I want to learn. If the data is somewhere already, in that case I may entertain paying.

1 comments

Thank you. Should have clarified that I know to use their API. But will this just be an absurd amount of data? I'm comfortable with python/R. What other languages should I use? Any database languages?