|
Hi there! I haven't launched yet, but you're throwing me a softball here. I am building it: https://PhotoStructure.com Cleaning up the mess left from early-adopting N photo apps and websites that subsequently shut down is why I took on this project. I've got 20-odd hard drives that have accumulated over the years, filled with backups and libraries from Apple Photos, Aperture, Picasa, several hundred gigs of Google Takeout tarballs, and other ancient DAM apps. I wanted a single, organized, deduped, copy of my photos and videos. Skip the thumbnails, the files that are missing original EXIF headers, or have suffered bitrot. Finally, I've got a single folder hierarchy I can rsync to my NAS or wherever, and know I got everything. There's a simple SQLite db I use for persistence, and a web server that sits on top of it that makes browsing and searching your whole library feel serendipitous. So yeah, it's Google Photos that lives on your bookshelf. Viva the distributed web! I'm looking into the applicability of dat and ipfs for secure sharing soon. I've got a limited number of beta users trying it out right now. If you're willing to share your feedback, please consider signing up. The beta is free. |
So this looks fantastic! Subscribed ... very willing to be a beta tester and provide detailed feedback.
However, the problem I'm finding is a small percentage of file corruption from all the storage upgrading and copying over the years, meaning no given file can be 100% trusted to be a valid original.
I haven't found any file or photo deduplication tools with the savvy to figure out which of two identically sized and timestamped files is the least corrupt image.
In many cases, a second generation is viewable while the original is present but unusable. This most often applies to very old Aperture libraries that got copied from NAS to NAS over the years, where a "master" may be corrupt but it still has a viewable generated high res cache as a JPEG.
Implication is the "structure" of the image files themselves has to be analyzed ... is this an uncorrupted viewable image?
Note that with JPEGs and various flavors of RAW, renderers will still happily open and display the file but what humans view can evidence bit rot. Conversely, some files are detected as corrupt by file examination, but can be viewed without problem.
To offer "principle of least loss" for mass merge of diverse collections, this would have to be figured out.