| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ljusten 1299 days ago

IIUC, rsync computes a relatively expensive Rabin-Karp rolling hash (https://librsync.github.io/rabinkarp_8c_source.html) and performs a hash map lookup for every byte. Hash map lookups might not be very cache friendly for larger data sets. In comparison, cdc_rsync only computes

  hash = (hash << 1) + random_table[data[n]];
  bool chunk_boundary = (hash & magic_pattern) == 0;

per byte. That's only a few ops and very cache friendly. The random table only has 256 entries, 8 bytes each, so it easily fits into L1.