|
|
|
Strategies to download data constantly changing via API
|
|
2 points
by rupestrecampos
444 days ago
|
|
I have to download a dataset through one API (WFS provided by geoserver) that tells me the total amount of items and delivers at maximum 1000 items per request and I can sort by one field and offset the requests start index. The layer has ~1Million items. I can use at maximum 5 parallel request before API gets overloaded. Problem is that items are being added and removed in real time, so at the end of the copy process I already have stale data copied and there are new items to be copied over.
So what would you do, or have done in this situation?
Start a never ending loop to crawl data all day long would be something evil or is it something to be fixed on provider side? The api url is https://geoserver.car.gov.br/geoserver/sicar/wfs Source data website: https://consultapublica.car.gov.br/publico/imoveis/index |
|
Not every application needs realtime data, querying it only on occasion or every few hours can be good enough.