|
|
|
|
|
by spankalee
1829 days ago
|
|
It looks like adding and removing documents before the end of the current batch may cause /existing/ documents to be skipped or processed twice. If you add a new document before the end of the current batch, the offset used for the beginning of the next batch will be too low, causing documents at the boundary to be processed twice. If you delete a document the index will be too high, skipping some documents. I think the temporary field solution might work, but you need stable indexing on the set to be traversed, so I think you need to add the temporary field to new documents and exclude them in the query, and you need to only soft-delete while traversing and exclude them post-query. Then you can clean up and remove the temporary fields and soft-deleted documents afterwards. |
|
I'll actually write up some tests to confirm that we don't process any docs twice!