Hacker News new | ask | show | jobs
by beejiu 3405 days ago
I've just run a small test (crawling a server running locally) and it comes out at 243 pages per second with one process. This crawls a webpage, adds its links to the queue and saves the URL in a Redis set. This is running on a Macbook Pro.
3 comments

So you eliminated the biggest cause of slow down in crawling, network latency, and are asserting yours is faster?
The selling point of Node.js is asynchronous I/O. I'm sure you mean bandwidth rather than network latency - in which case that is really not a limiting factor when running in a datacenter (40 Gbps in at Linode for example).
Some library in python ,such as asyncio or gevent could do some work asynchronously and efficently. I will have a test later for these library. In the meanwhile , welcome to post more details about asynchronous of Node.js. Thanks for your comment again!
Network latency would be a valid concern for avg. time/req. not for throughput (necessarily)
Nice! (compared to the article's 100,000 pages / 15 minutes / 60 seconds = ~110 pages per second)

Did you happen to track memory usage at all? It would take a while to settle down, for sure.

I'm always interested in the amount of overhead Docker brings to the table. No biggie either way, thanks for sharing these details.

Thanks for your comment, I will have a test with a local server later and find out what's the upper limit of mine