|
|
|
|
|
by Hitton
2463 days ago
|
|
Disclaimer: I have rather small experience with Golang and just skimmed the crawler code. From what I could see, author made effort to make the crawler distributed with k8s (which I don't is needed considering there are only approximately 75 000 onion addresses) using modern buzzword technology, but from what I could see the crawler itself is rather simplistic. It doesn't even seem to index/crawl relative urls, just absolute ones. |
|
These things can happen in parallel but let’s also assume no more than 32 simultaneous TCP connections per host through a Tor proxy.
So we’re looking at ~75k1005/32 seconds = 14 days to run through all of them. You may not need to distribute this but there are situations (e.g. I want a fresh index daily) where it is warranted.