|
|
|
|
|
by chaxor
1098 days ago
|
|
What is the correct tool to properly merge a large set of tar.gz files for which may have an enormous overlap of similar files, and some that have been altered just slightly? Git plus some parsing seems close in that space, as analyzing the files to create a dendrogram like tree of potential alterations to files over time by levenstein distance may be useful to approximate commit history.
However, this doesn't seem to exist or be popular as a tool.
There's vimdiff or meld, but they are extremely manual and tedious to the extent of being pointless to try for something like a large history of takeout tar.gz's. Throwing in the towel completely, borgfs can be helpful to reduce the amount of space they take by de-duplication on the block level, but this is a terrible solution as it doesn't really track file changes in a reasonable way, etc. It is useful to extract the files into a directory without the tar or gz, but this can also cause issues with how to appropriately organize the directory structure over the history. Any thoughts or projects that do a better job of this? |
|
Can you elaborate on this? My understanding is that they should all extract into the same target folder without issues because each archive's set of files is distinct. But maybe I'm just assuming wrongly?
What exactly is your goal, too? It sounds like you are trying to find and de-duplicate visually similar images? Like what do you mean by "enormous overlap" or "altered just slightly"?