Hacker News new | ask | show | jobs
by generalizations 847 days ago
Can confirm. A few discrete scripts each focused on one part of the process can make the whole thing run seamlessly async, and you naturally end up storing the pages for processing by subsequent scripts. Especially if you write a dedicated downloader - then you can really go nuts optimizing and randomizing the download parameters for each individual link in the queue. "Do one thing and do it well" FTW.