Hacker News new | ask | show | jobs
by vrighter 357 days ago
The deduplication in the product I worked on was implemented by me and a colleague of mine, in a custom format. The point of it was to do inline deduplication on a best-effort basis. I.e. handling the case where the system does NOT have enough memory to store hashes for every single block. This might have resulted in some duplicated data if you didn't have enough memory, instead of slowed down to a crawl by hitting the disk (spinning rust, at the time) for each block we wanted to deduplicate.