Hacker News new | ask | show | jobs
by EdSchouten 955 days ago
You can use a CDC algorith, but if you know that duplication mostly occurs at power-of-two boundaries, there is no need to use that. Deduplicating on a binary tree basis is sufficient.