I'm trying to use common crawl for an ML project/search engine, but:
- Requests to download even a small amount of data get rate ACLed (it says slow down/too many requests)
- It seems like this is a known issue and that common crawl is no longer well maintained. https://groups.google.com/g/common-crawl/c/BvMGYUY-dro
Are there any alternatives for accessing a large amount of web crawl data?
- Requests to download even a small amount of data get rate ACLed (it says slow down/too many requests) - It seems like this is a known issue and that common crawl is no longer well maintained. https://groups.google.com/g/common-crawl/c/BvMGYUY-dro
Are there any alternatives for accessing a large amount of web crawl data?
Thanks!