|
|
|
|
|
by josefcullhed
1554 days ago
|
|
Founder here, I suggest you start by not implementing a crawler but use commoncrawl.org instead. The problem with starting a web crawler is you will need a lot of money and almost all big websites are behind cloudflare so you will be blocked pretty quickly. Crawling is a big issue and most of the issues are non-technical. |
|
Some sort of partnership between crawlers could go a long way. Have you considered contributing content back towards the Common Crawl?