| Some years ago on a large though not especially well-known social network the task of deleting certain image files which it proved problematic to possess fell in my lap. The list had been curated by ... some process not fully explained to me. A small number of spot checks convinced me that I didn't want to run any further validations myself, and I've rarely shredded any files harder. The total set of images numbered in the millions, with each source image resulting in numerous thumbnail and preview sizes, as well as differing versions of the service app resulting in different naming patterns, paths, and locations. All of which were fronted by a CDN that had its own deletion mechanisms which I had to learn and adapt. The project involved conferences with the CDN's engineers. I rapdily got the sense that large-scale bulk deletes weren't a frequently-encountered use case, as the default was to use a web form. That would have taken centuries to complete. Some simple shell and awk could generate all the potential patterns, and batch the deletions (about 200 per request, with a return code indicating whether or not the request was accepted or the queue was full). Documentation and initial tests suggested that it might take weeks, possibly months, to complete the deletions from the CDN. Residency on the CDN in any event was ~9 - 18 months, though no clear guarantees of deletion. In practice, I kicked off the job on a Friday afternoon, and it completed over the weekend. The same initial request-generating code could be used to spot-check (random sampling), and eventually exhaustively search the space to confirm that all deleted content was now 404. This was well before GDPR, and though the network userbase numbered in the tens of millions, the engineering staff was small (technology is an interesting multiplier lever, useful when deploying, problematic when dealing with issues at scale). Upshot: deletion can be complicated. It's generally possible, however. (A full scrub would have involved backups. I believe that the technical solution to that problem was not having any in the first place. Largely confirmed when the service fell over completely a few years later. Another warning regards online SAAS.) |