Hacker News new | ask | show | jobs
by smjburton 407 days ago
> When an article is requested multiple times, we memorize – or cache – its content in the datacenter closest to the user. If an article hasn’t been requested in a while, its content needs to be served from the core data center.

Maybe a similar system needs to be set up so that bot requests need to present their latest cache or hash ID of the requested content before a full request can be granted. This way, if the local cache is recent, it doesn't burden the server with requests for content they've already seen, and they can otherwise serve their users information based on the version they have stored locally.

1 comments

This already exists in the form of ‘if-modified-since’ http cache property. A requester can ask the server if the data they currently have is stale within the current spec.

In my experience, the problematic crawlers choose not to implement this feature.