| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joshspankit 1344 days ago

There are a few shortcomings:

1. “Lots of tiny files”

Some folders unavoidably have tiny files in bulk (Document backups can be like this. One other example that jumps to mind: macOS applications with translation files)

In these cases, PAR/PAR2 have issues with the block size (can only have one file per block which leads to a lot of wasted space)

2. Tracking changes across filenames

This is counterintuitive, but I’ve run in to this enough to mention it: if the item to archive is a folder where the contents might change over time, any single file might get renamed and it’s contents slightly modified. A parity file tool could look at the blocks that have not changed, recognize the rename, and “correct” the reference before doing more processing. If it’s a valid change to the file: saving the work required to recalculate the whole file and if it’s damage to the renamed file: being able to repair it simply.

3. Being able to update in-place

Sometimes the ideal is to create parity files for a folder, even if that folder is actively used (say for example b-roll that changes by 10% maybe once a month). A parity tool could update that 10% without having to recalculate the whole thing (Ideally this would be adding files similar to ‘git add’ so that someone does not accidentally add file damage to the parity set)

1 comments

Nyan 1344 days ago

I see. Yeah, PAR3 tries to address the 'lots of files' scenario better than PAR2.

But updates are problematic. You could 'delete' parts of a recovery file if the data is present. However a file being updated typically means that the original data (before the update) is no longer present, meaning you can't really 'delete' that part of the recovery to replace it with the new data. You could try and retrieve the old data by recovering it, but at this point, it may just be easier to recreate the entire PAR set again.

If it's just adding to the recovery set, PAR3 does have provision for that.

link

joshspankit 1341 days ago

I should thank you, though it’s also meant I’ve lost most of the last 3 days:

After our discussion I went and found the current work on PAR2/PAR3 (which, for those in the know: the current PAR3 format is not the old on the was never finished, but a spec that’s been re-written and is close to completion with real-world tools close behind) and I have a lot more hope for the future of parity files.

I still think they are wildly under-utilized (BackBlaze has always used them, but they are the only business I know of), but we might be having a different conversation in 2-3 years.

link