| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rafaelm 379 days ago

There are cases where Google might find a URL blocked in robots.txt (through external or internal links), and the page can still be indexed and show up in the search results, even if they can't crawl it. [1].

The only way to be sure that it will stay out of the results is to use a noindex tag. Which, as you mentioned, search engine bots need to "read" in the code. If the URL is blocked, the "noindex" cannot be read.

[1] https://developers.google.com/search/docs/crawling-indexing/... (refer to the red "Warning" section)