Hacker News new | ask | show | jobs
by lucideer 959 days ago
There's no real cat & mouse game here (yet*) - sites don't do anything to mitigate this. Sites deliberately make their content available to robots to gain SEO traction: they're left with the choice of allowing this kind of bypass or hurting their own SEO.

* I say "yet" because there could conceivably be ways to mitigate this, but afaik most would involve individual deals/contracts between every search engine & every subscription website - Google's monopoly simplifies this somewhat, but there's not much of an incentive from Google's perpsective to facilitate this at any scale.

1 comments

Google publishes IP ranges for GoogleBot. You can also reverse-lookup the request IP address - the resolved domain should in turn resolve to the original address.
Does anyone else remember 10 years ago when Google would penalize sites for serving different content to GoogleBot than to normal users? Those were the days.
> Google would penalize sites for serving different content to GoogleBot than to normal users

Listed under spam policies:

https://developers.google.com/search/docs/essentials/spam-po...

   "Cloaking refers to the practice of presenting different content to users and search engines with the intent to manipulate search rankings and mislead users"
The top of the pages says sites that violate the policies may "rank lower or not appear in results at all".
It's infuriating when you see part of your desired information in the search results and then open the page to find a paywall. IIRC ExpertsExchange were doing that for a long enough time that it was obvious the policy was not enforced. At least not evenly.