|
|
|
|
|
by dmn001
3154 days ago
|
|
There is no issue with parsing and scraping in the same loop as long as there is caching in there as well. You don't want to be hitting the server repeatedly whilst you're debugging. A project like Scrapy should have caching on by default, but it seems to be an afterthought. Repeatable and reproducible parsing of cached websites is necessary, e.g. if you find additional data fields that you want to parse without downloading the entire site over again. |
|
With caching, you are at the mercy of whatever third party caching scheme is used under the hood and raw pulled data can disappear any time without your explicit command (e.g., if some library gets updated and decides that this invalidates the caching scheme).