|
|
|
|
|
by ljusten
1252 days ago
|
|
IIUC, rsync computes a relatively expensive Rabin-Karp rolling hash (https://librsync.github.io/rabinkarp_8c_source.html) and performs a hash map lookup for every byte. Hash map lookups might not be very cache friendly for larger data sets. In comparison, cdc_rsync only computes hash = (hash << 1) + random_table[data[n]];
bool chunk_boundary = (hash & magic_pattern) == 0;
per byte. That's only a few ops and very cache friendly. The random table only has 256 entries, 8 bytes each, so it easily fits into L1. |
|