Hacker News new | ask | show | jobs
by Andrex 2761 days ago
The publisher can also control how much is shared with third party aggregators, either through robots.txt or a paywall method.

Which has been the case since search engines became a thing.

1 comments

That isn't the same at all. A publisher cannot use robots.txt, and much less paywalls, to indicate a part of text that can be shared in syndication.
A paywall can. The page displays the snippet the publication is allowing to be shared, while the paywall hides the rest. I believe this is what a few of the bigger US newspapers are doing right now.
Ok, but that would require regular readers to have credentials for the paywall. I understood the discussion to be about scraping publicly accessible sites.