| There are two main reasons why I say nobody besides Google is really allowed to crawl the web. The first is that Google gets much more access to pages on websites than everybody else. You can see this by examining the robots.txt files of various websites[0]. I've been doing this for several years now and Google has a consistent advantage across many thousands websites that I've looked at. This adds up to a significant advatnage and many search engine operators complain about how it hampers their ability to compete with Google[1]. The second is that Google gets to ignore crawl delay directive in robots.txt while other search engines don't[2]. Website operators cannot tell Google how fast they want their website crawled, they can only request that Google slow down. If another search engine tried to do what Google does, they would likely be blocked by many important websites. If you would like to read more about this, please checkout https://knuckleheads.club/ [0] https://pdf.sciencedirectassets.com/robots.txt [1] https://www.nytimes.com/2020/12/14/technology/how-google-dom... [2] https://www.seroundtable.com/google-noindex-in-robots-txt-de... |