Hacker News new | ask | show | jobs
by motherwell 6232 days ago
Just for clarrification: 1. robots.txt excludes CRAWLING e.g. downloading but NOT indexing, e.g. including a URL / site in a database of known URLs / sites. 2. Robots meta tag disallows INDEXING but NOT crawling.

So it is semantically correct, although most modern SEs do not do this, to index a site / URL that is disallowed via robots.txt, using link data alone.