Hacker News new | ask | show | jobs
by mitchtbaum 3804 days ago
> 1) detect pages that had changed since the last crawl, to avoid recrawling pages that hadn't changed?

Usually web clients use https://en.wikipedia.org/wiki/HTTP_ETag , afais. If a web app\server lacks that skill, then you could compute your own hash and check it yourself, instead of processing that condition at the network layer.