| HN Mirror

Loop/spam prevention was done by mixnode, I'm not sure how they do it.

The data does not follow a DFS or BFS pattern so pages/site varies greatly by a host's server capacity and anti-crawling configs.

There was a minimum of 10 seconds between followup requests to the same website unless robots.txt had a lower delay. Pretty polite...