I think they might mitigate the need to crawl _every_ page of every web site in that fashion. They must be doing some sort of analysis to "old-school-crawl" pages that don't need javascript interpretation.
What if they don't actually "render" the dom as part of the "load" analysis... this means they don't necessarily need to handle certain UI/UX aspects that can be bypassed.. they could then output the "rendered" content for passthrough to the same system that does their general crawl analysis for additional details.
The work could be broken up in any number of ways... from my own testing, and experience with others testing. Content crawls/recrawls from JS data tends to lag a couple days behind initial scan... having an updating sitemap xml resource is a good idea for "new" content if you're doing JS based content.. also, rescans will still lag well behind the general non-js content scans...
The work could be broken up in any number of ways... from my own testing, and experience with others testing. Content crawls/recrawls from JS data tends to lag a couple days behind initial scan... having an updating sitemap xml resource is a good idea for "new" content if you're doing JS based content.. also, rescans will still lag well behind the general non-js content scans...