|
|
|
|
|
by leotaku
1406 days ago
|
|
Some CDC implementations I have seen use a desired "average" chunk size value in addition to a minimum and maximum value. If the chunk exceeds the desired average size, the test for recognizing a byte sequence as a stop becomes more forgiving. Other solutions also retry previously processed sequences using the simpler threshold. However, from what I've seen, these methods generally come at the cost of deduplication and/or speed. The most reliable method to avoid pathological cases seems to just be setting the min/max chunk size to a low/high enough value respectively. If you're talking in a purely theoretical sense, I would assume that the possibility of changes affecting non-local chunks is inherent to CDC. With well-chosen parameters the likelihood of any but the closest chunks being affected just becomes low enough to be negligible. |
|