Ask HN: Alternatives to Common Crawl?

I'm trying to use common crawl for an ML project/search engine, but:

- Requests to download even a small amount of data get rate ACLed (it says slow down/too many requests) - It seems like this is a known issue and that common crawl is no longer well maintained. https://groups.google.com/g/common-crawl/c/BvMGYUY-dro

Are there any alternatives for accessing a large amount of web crawl data?

Thanks!