| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nlogn 5656 days ago
	I don't even think they should just blacklist Google. They should just respect robots.txt. edit: I should have clarified. I know that the Bing crawler likely respects robots.txt, but if they are using clickstream info to build their index, it seems right that they should respect robots.txt there as well, no?

3 comments

stanleydrew 5656 days ago

I'm pretty sure the Bing Crawler does respect robots.txt. The data Bing collected didn't come from spidering Google.

link

cgoddard 5656 days ago

You could strongly argue that collecting clickstream and other user browser session info via a toolbar is not a form of web robot (crawler, spider, etc.), and thus robots.txt does not apply.

link

FreebytesSector 5656 days ago

I agree with your comments that toolbars should respect the robots.txt because even if a human is doing the crawling, it is still an automated system that is indexing information from that site. I would not want toolbars attempting to send data back to Bing based on my queries on a company Intranet or a site that would normally not be indexed. Personal data entered into what the toolbar thought was a query field could be sent onward as well even if the robots.txt on the site restricted it. I think they should respect robots.txt in this case even if they are only monitoring user behavior.

link