Hacker News new | ask | show | jobs
by nulluk 4819 days ago
I wouldn't recommend detecting bots, Google will see it as clocking and badly penalise you for it as it goes against there guidelines: http://support.google.com/webmasters/bin/answer.py?hl=en&...

Returning a noindex meta or header should be enough for the honest crawlers, if your worried about dishonest crawlers then your fighting a loosing battle and have a different problem all together.

2 comments

Yes meta noindex is the standard way.

But what if you don't want to waste server resources in bots crawling thousands of meta noindex pages? Perhaps you are using some heavy SQL queries on those pages.

You can block crawling with robots.txt but then Google won't see the noindex and URLs will be indexed.

If you block and send a 404 to bots I think that's fine. They will see a blank page - nothing to gain from that in ranking. So cloaking, perhaps yes, but I don't think it would be risky.

You could return an error 403 to the bot, if detected. Google may penalize that page in the rankings, but not the whole site. If you display spammy keyword loaded page to the bot, then Google may ban/penalize the whole site.