| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cwillu 804 days ago
	Accessing a directly referenced page is common in order to receive the noindex header and/or meta tag, whose semantics are not implied by “Disallow: /” And then all the links are to external domains, which aren't subject to the first site's robots.txt

1 comments

This is a moderately persuasive argument.

Although the crawler should probably ignore all the html body. But it does feel like a grey area if I accept your first pint.

You've been able to convince me to accept his second pint. Friday it is.