|
|
|
|
|
by burnt-resistor
66 days ago
|
|
File system dedupe is expensive because it requires another hash calculation that cannot be shared with application-level hashing, is a relatively rare OS-fs feature, doesn't play nice with backups (because files will be duplicated), and doesn't scale across boxes. A simpler solution is application-level dedupe that doesn't require fs-specific features. Simple scales and wins. And plays nice with backups. Hash = sha256 of file, and abs filename = {{aa}}/{{bb}}/{{cc}}/{{d}} where aa = hash 2 hex most significant digits bb = hash next 2 hex digits cc = hash next 2 hex after that d = remaining hex digits |
|