Source addresses: the Google bot traffic comes from a small set of Google-owned IP address blocks.
Third parties bake this into things like Web Application Firewall (WAF) rules. For example, Azure App Gateway WAF has a policy category for “known bots” which includes Google but excludes your tiny AI startup.
It’s a moat built by giant corporations to keep tiny players in their place.
I agree it’s a moat, but why would azure restrain competition from google? I think it’s just yet another example of an anti-abuse collateral damage, like email anti-spam blocking small unknown servers or Cloudflare blocking unknown IPs.
Because Azure customers want Google to be able to index their sites.
Would you host your e-commerce or social media site on a cloud provider that blocked Googlebot?
It's not Microsoft giving Google a handout out of the goodness of their hearts; it's Azure customers demanding that functionality. (Those Azure customers also don't care about a random little search startup and probably don't want to pay any egress fees to serve traffic to it.)