| I have on the order of terabytes of digital photos from QuickTake through Nikon's various Ds to the Sony A9, with various pocketables and all the generations of iPhone along the way. I have a quarter million iCloud Photos images, 30K on Flickr, etc. So this looks fantastic! Subscribed ... very willing to be a beta tester and provide detailed feedback. However, the problem I'm finding is a small percentage of file corruption from all the storage upgrading and copying over the years, meaning no given file can be 100% trusted to be a valid original. I haven't found any file or photo deduplication tools with the savvy to figure out which of two identically sized and timestamped files is the least corrupt image. In many cases, a second generation is viewable while the original is present but unusable. This most often applies to very old Aperture libraries that got copied from NAS to NAS over the years, where a "master" may be corrupt but it still has a viewable generated high res cache as a JPEG. Implication is the "structure" of the image files themselves has to be analyzed ... is this an uncorrupted viewable image? Note that with JPEGs and various flavors of RAW, renderers will still happily open and display the file but what humans view can evidence bit rot. Conversely, some files are detected as corrupt by file examination, but can be viewed without problem. To offer "principle of least loss" for mass merge of diverse collections, this would have to be figured out. |
What I've found on my older hard drive backups was file corruption due to bitrot or file truncation.
I use `jpegtran` to validate JPEG bytestreams, `dcraw` to validate RAW images, and ``ffmpeg` to validate videos. At least for my quarter-million-file corpus, those tools detect corruption sufficient enough for me to want to skip the file. I actually had to write a bit rotter to write tests for this, and do glitch inspection.
> To offer "principle of least loss" for mass merge of diverse collections, this would have to be figured out
Every unique SHA gets copied into your library (if you have copies enabled), but any given asset will have 1 or more asset files (that are merged in the UI and DB). To minimize risk from bugs^H^H^H^H "undiscovered features," PhotoStructure never moves or deletes files excluding it's own cache and db.