Hacker News new | ask | show | jobs
by EdSchouten 836 days ago
Are there actually practical cases where Rabin-Karp hashing is what dominates the running time of an application? The naive implementation already gives you 0.75 GB/s. Seems pretty fast.
3 comments

Compression, synchronization and backup systems often use rolling hash to implement "content-defined chunking", an effective form of deduplication.

In optimized implementations, Rabin-Karp is likely to be the bottleneck. See for instance https://github.com/facebook/zstd/pull/2483 which replaces a Rabin-Karp variant by a >2x faster Gear-Hashing.

I found it to be a good pedagogical tool. In terms of actual use cases, maybe something like rsync'ing a large directory where only a few files have changed? It starts to get pretty contrived though.
Maybe implementing `foo LIKE '%bar%'` in a columnar DB backed by SSDs. Even if you're not strictly CPU-bound, there's a scale where you might care about the power use.