Y
Hacker News
new
|
ask
|
show
|
jobs
by
EdSchouten
955 days ago
You can use a CDC algorith, but if you know that duplication mostly occurs at power-of-two boundaries, there is no need to use that. Deduplicating on a binary tree basis is sufficient.