Hacker News new | ask | show | jobs
by cwillu 804 days ago
Accessing a directly referenced page is common in order to receive the noindex header and/or meta tag, whose semantics are not implied by “Disallow: /”

And then all the links are to external domains, which aren't subject to the first site's robots.txt

1 comments

This is a moderately persuasive argument.

Although the crawler should probably ignore all the html body. But it does feel like a grey area if I accept your first pint.

You've been able to convince me to accept his second pint. Friday it is.