|
Todd from Prerender.io here. We always knew this day would come eventually :) We are currently serving around 60 million prerendered pages to crawlers every day, with Google being about half of those requests. We are recaching around 1 billion pages every month in PhantomJS/Headless Chrome. Google is the only crawler executing a meaningful amount of JavaScript so Bing, Baidu, Yandex, Facebook, Twitter, and other SEO crawlers still need prerendering. For anyone that will need to update their own crawlers to match Google’s new javascript crawling, we’ve opened up our Prerendering engine that uses Headless Chrome at https://prerender.com. You can capture HTML, Screenshots, PDFs, or even HAR files from any web page with just an http request to our service. So it’s super easy to add javascript crawling to any crawler with Prerender.com (and it’s open source https://github.com/prerender/prerender). For our Prerender.io customers, this announcement just means that Google will stop crawling ?_escaped_fragment_= URLs so they won’t request prerendered pages anymore. Instead, Google will just execute the javascript directly and index the result. We’ve always recommended that our customers use the escaped fragment protocol, so it will be a smooth transition as Google slowly stops crawling the ?_escaped_fragment_= URLs. No changes need to be made if you are currently using Prerender.io. Keep an eye on our twitter (@prerender) and we’ll give updates on Google’s transition. The one thing to look for when Google starts executing your javascript is keep an eye on your Google Webmaster Tools for your number of pages crawled by Google. In the past, we’ve seen that Google crawls much slower when executing the javascript themselves. Hopefully javascript websites don’t take a hit in number of pages crawled daily since that can affect large sites having all of their pages up to date in Google's index. |
Assuming renders take 7-10 seconds at worst, that means (if I've got my math right!) that you need to do between (60m/(86400/7)=4861) and (60m/(86400/10)=6944) renders per second in order to keep up. (86400 = seconds in a day)
...Ahahahaha :)
Given that a single Chrome instance on my new-but-not-particularly-amazing i3 box can be sluggish at the best of times... I have no idea what sort of tolerances Xeon(?)-class hardware (possibly running Xen? :P) have to running multiple entire copies of Chromium... I initially wondered if you needed 1000 compute instance, then I realized maybe you only needed 400, now I honestly don't know at all.
--
I'm also curious how using Headless Chrome and PhantomJS is working out. As in, genuinely interested. IIUC my understanding is that PhantomJS has pretty much wound down, while Headless Chrome is fractionally different enough from Chrome that it's possible to tell which one you're running on (https://news.ycombinator.com/item?id=14936025). I've been idly curious about "perfectly sandboxing" webpages so they honestly can't tell they're not in a "normal" PC/laptop/mobile environment, and my impression is that I'd have to start with a _very_ carefully configured copy of normal Chromium in order to do it.
--
I must admit that I got curious at what 60m monthly renders looked like against the pricing structure... but couldn't really figure it out, it's not a simple enough exponential curve (and I can't math for nuts). Single-stepping through the pricing algorithm was very interesting though ($1522 for enterprise, huh cool).
--
PS. The view-source link at the bottom is unfortunately broken; Chrome blocked opening such URLs recentlyish. Fixing it will likely require, ironically, a little server-side renderer :)
--
EDIT: One last thing, note https://news.ycombinator.com/item?id=15882066 from this thread