| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Xylakant 2043 days ago
	Note that robots.txt is a hint to well-behaved crawlers, not blocking them in any regard. You can block crawlers if you can identify them, but reliably identifying them is hard.

1 comments

tleb_ 2042 days ago

We should probably classify the crawler identifying problem as impossible and move along. Less resources wasted and easier automation for everyone. Assuming a crawler is malicious is narrow-minded.

link

ddorian43 2041 days ago

https://developers.google.com/search/docs/advanced/verifying...

link

Xylakant 2041 days ago

This helps to verify that a bot that announces itself as google bot is indeed a google bot. It doesn’t help identify a bot that pretends to be a user/browser.

link