|
|
|
|
|
by eastdakota
5502 days ago
|
|
CloudFlare doesn't block legit crawlers either. It does cache responses to crawlers so if a page hasn't changed and Google crawls it again the request doesn't burden the origin. What's interesting about CDN in a Box is they're serving off a single IP. The problem with this strategy is Google classifies sites for crawl purposes by IP. That means if one site on CDN in a Box falters, all the other sites on CDN in a Box will suffer (e.g., Google turning down crawl velocity or completely removing them from the index). The same problem occurs if there's anything spammy or compromised by malware. At CloudFlare, we tried the CDN in a Box strategy when we launched more that a year ago. We quickly found it had serious negative impacts on site rankings. We spent considerable time working directly with Google and the other search engine crawl teams on a solution. Today, sites on CloudFlare actually get the highest crawl velocity setting because of this work, which we've seen positively impact site rankings. I'm curious to hear more about CDN in a Box's plans, discussions with search engine crawler teams, and technologies they've developed to overcome this challenge. |
|
You keep saying we serve off of one IP that is blatantly false.
I'll put my Crawl Rate Up Against anyone's because we had to have a conversation with Google's team because their Bot hit one of our sites for 1.2M crawled pages in 3 hours. Which is nice, but then they did it again the next day. and the next. So we are negotiating to not have to pay for Google bot traffic.
In webmaster's tools you can't even change the setting of a CDN In A Box Site, because Google Assigns you a Special Crawl rate.
Bing's Bot loves us, because they will often crawl "all pages at once" so they will crawl 10k pages in 30 seconds. and go on to the next site.