Hacker News new | ask | show | jobs
by strken 1038 days ago
If the content is literally the same, the crawler should be able to use If-Modified-Since, right? It still has to make a HTTP request, but not parse or index anything.
1 comments

If the content is dynamic (e.g. a list of popular articles in a sidebar has changed), then the page will be considered "updated".
This is not correct. It’s up to the server, controlled by the application to send that or other headers. Similar to sending a <title> tag. The headers take priority and similar to what another person said they will do a HEAD request first and not bother with a GET request for the content.