|
|
|
|
|
by Quatschmann
3500 days ago
|
|
Yep, that's why I've build my own, the existing ones don't give out a list of the links or are super slow. A co-worker made the first one in Python but it was so slow that it took hours (6+ sometimes) to finish a site and I thought "you can do that faster". Problematic are some sites that don't use <a href="asd.com"> tags because that's what my crawler is looking for. C# & Elixier & Rust where the the other options I thought about and I want to build the same crawler on these languages (relative easy to do with ~300 LOC) to compare them for network / server / cli stuff but that has to wait till next year. |
|
There is some super useful stuff too though that made it easy to write a generic extensible crawler. My implementation ended up supporting separately compiled plugins you could just dump in a 'plugins\' directory, which responded to events and had full ability to manipulate the output pipeline. Do-able in lots of languages, but c# has some formalized helpers around it that make it super easy.