| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by prmoustache 1035 days ago
	ATMO you shouldn't have to maintain knowledge of what kind of crawler bot exist and having to maintain deny list. It should be the opposite, only expressedly allowed content should be crawled by mainaining allow lists.

1 comments

wraptile 1035 days ago

You can do the opposite since the inception of robots.txt: User-agent: * Disallow: / and then whitelist google bot and whatnot. Most of the web is already configured this way. Just check robots.txt of any major website, e.g. https://twitter.com/robots.txt

link

xnx 1035 days ago

The Allow: directive was an extension to robots.txt added later.

link