|
|
|
|
|
by oofbey
260 days ago
|
|
The clever trick is how it recognizes insertions. The standard trick of computing hashes on fixed sized blocks works efficiently for substitutions but is totally defeated by an insertion or deletion. Instead with CDC the block boundaries are define by the content, so an insertion doesn’t change the block boundary, so it can tell the subsequent blocks are unchanged. I haven’t read the CDC paper but I’m guessing they just use some probabilistic hash function to define certain strings as block boundaries. |
|