Hacker News new | ask | show | jobs
by maceurt 2670 days ago
Yes, this could be cool does not seem too simple or too hard.

On a related note a lot of different comments seem to be mentioning scraping some sort of data from a website on a continual basis. Would I just create a script that is attached to an extra worker that would send its data to the actual database that would in turn be read by the web server? Or would I want to just have the web server itself get the data and write to the database?

1 comments

A couple of years ago I wrote a feed reader which would check every hour for new items in a few hundreds feeds. This script was running on a $5/mo server(initially it ran of an old laptop I had for easier debugging) and it would post the the new data to the database located on the website server. So I was using two machines, one for the crawler and one for the website, and two databases too I think. The one for the feed crawler was very simple with only the list of urls and the latest item url, so I don't show it again. That was the theory, at least, feeds are a bit more complicated in real life.

That's what I did, but I might have had different requirements. If you don't have a lot to crawl and you don't have to do it very often(once a week or less), you can probably space out the requests enough so that the server doesn't feel it. It helps a lot if you use some caching as well for the website itself in this case. I think it depends a lot on the requirements of the project. But using two machines is safer I think, although it might complicate things a bit.

Keep in mind that there's probably better technical advice out there than mine. I'm a hobbyist developer.