|
|
|
|
|
by anarkafkas
1826 days ago
|
|
Great question! Since traversing the entire collection may take a while, it's definitely possible that a new doc has been added in the meantime. Whether that new doc will be traversed or not depends on its order/index within the collection. It definitely won’t be traversed twice. If it's positioned before all the docs in the current batch then it won't be traversed. If it's positioned after the current batch then it will be. So obviously that also depends on whether you’re traversing a plain collection or a Query. Catching all the new docs that were added requires implementing a different strategy like adding a temporary field to all the traversed docs and then querying the ones that don’t have that field. It’s definitely something that we can implement soon! |
|
If you add a new document before the end of the current batch, the offset used for the beginning of the next batch will be too low, causing documents at the boundary to be processed twice. If you delete a document the index will be too high, skipping some documents.
I think the temporary field solution might work, but you need stable indexing on the set to be traversed, so I think you need to add the temporary field to new documents and exclude them in the query, and you need to only soft-delete while traversing and exclude them post-query. Then you can clean up and remove the temporary fields and soft-deleted documents afterwards.