|
|
|
|
|
by mitchtbaum
3804 days ago
|
|
> 1) detect pages that had changed since the last crawl, to avoid recrawling pages that hadn't changed? Usually web clients use https://en.wikipedia.org/wiki/HTTP_ETag , afais. If a web app\server lacks that skill, then you could compute your own hash and check it yourself, instead of processing that condition at the network layer. |
|