Hacker News new | ask | show | jobs
by nwh 4823 days ago
Google uses the "completion" feature of Google Chrome to collect new URLs to scrape. If you have that on, they crawl after your visit.
3 comments

Do you have any links that go into detail on this? I was intrigued by your comment and I want to read more about it but ironically I couldn't find anything on Google!
I can't find the paper I read on it either, but I can confirm that it happens anecdotally with Google, and oddly enough AIM Messenger. I've had URLs that have never had an inbound link, and magically GoogleBot rocks up when I show a Chrome user. I'll keep looking for it.
I'm also convinced they use Google Analytics to find new URLs. I've seen URLs that I only had in AJAX calls indexed before (and I fired events to GA on these URLs).
They have expressly denied that in the past. (Where "that" is "using user data for Google Analytics to expand the crawl set". They're also on the record as saying "no use of toolbar data.")

The more likely thing you are experiencing is Google reading your AJAX URLs, either by evaluating JS or by using heuristics. Google is known to do both of these, but a lot of HNers get surprised when I mention it, so FYI.

I once had a spammer hit one of my contact forms a few hundred times on a page set up to capture traffic from South Dakota. there was a corresponding goal set up in Google Analytics that triggered and a week or so later the S Dakota page popped up as a site link on SERPS. Certainly doesn't prove anything but the page got essentially zero traffic and had no external inbound links and wasn't weighted very heavily in term of site architecture. Makes me wonder if there isn't some careful parsing of words in their claims. /removestinfoilhat
Are you implying that private URLs typed in the Chrome address bar might end up in the crawler queue ?
People using Chrome with those settings enabled should probably read up on its privacy policy [1]. Its features such as, "use a web service to help resolve navigation errors", "use a prediction service to help complete searches and URLs" send data to the default search provider.

Also, these features are enabled by default.

[1] https://www.google.com/chrome/intl/en/privacy.html