| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by datastack 361 days ago

Exciting!

Yes, the deduplicated approach is superior, if you can accept requiring dedicated software to read the data or can rely on a file system that supports it (like Unix with hard links).

I'm looking for a cross-platform solution that is simple and can restore files without any app (in case I didn't maintain my app for the next twenty years).

I'm curious if the software you were working on used proprietary format, was relying on Linux, or used some other method of duplication.

1 comments

vrighter 358 days ago

The deduplication in the product I worked on was implemented by me and a colleague of mine, in a custom format. The point of it was to do inline deduplication on a best-effort basis. I.e. handling the case where the system does NOT have enough memory to store hashes for every single block. This might have resulted in some duplicated data if you didn't have enough memory, instead of slowed down to a crawl by hitting the disk (spinning rust, at the time) for each block we wanted to deduplicate.

link