|
|
|
|
|
by cwillu
804 days ago
|
|
Accessing a directly referenced page is common in order to receive the noindex header and/or meta tag, whose semantics are not implied by “Disallow: /” And then all the links are to external domains, which aren't subject to the first site's robots.txt |
|
Although the crawler should probably ignore all the html body. But it does feel like a grey area if I accept your first pint.