Hacker News new | ask | show | jobs
by jdunck 3766 days ago
If Google (or any other crawler) wanted to play nice with paywalls, they could issue a public key for their bot, and put a signature in their User Agent string that the domain could then verify.

Those signatures could obviously leak, but on a per-domain basis. Perhaps the domains could have a secure way of bumping the valid key generation if they had a leak.

2 comments

There are two problems with this.

First, they don't want to. In fact, if a search engine can figure out that a link is going to lead to a paywall, they'll probably want to reduce the ranking of the result, because the user is not going to want results they can't actually look at.

Second, it would be a massive antitrust violation because it would prevent access by competing crawlers. The only way around that is to allow access to anyone who claims they're a crawler, which was the original problem.

The current situation with the WSJ could already be considered an antitrust violation. It's whitelisting one crawler and leaving the other ones out.
Google (and every other major search engine) already provide a way, i.e. reverse DNS lookup, to authentic bot ownership:

https://support.google.com/webmasters/answer/80553?hl=en

AFAIK no content provider actually does this check though.