| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nativeit 1001 days ago

I agree, although trends for whether this is becoming more or less the default is not quite as clear in my mind. On one hand, the sheer mass of content being generated every day has become exponentially larger (I wonder if this has begun leveling off at all?), so there’s more to index and presumably more noise with which to conceal a signal. On the other hand, data science has progressed, storage media is cheaper, and everything is much more accessible; As a result creepy services like LexisNexis, Palantir, and TLOxp have all become vastly more sophisticated in their ability to retain and analyze data that they can pretty effectively associate with specific people/organizations.

I’m not sure which factor is more influential—the ability for data to persist, or the ability for it to then be found and interpreted. Would it matter if the content was still available in some forgotten corner of the internet if there weren’t effective tools available for finding it and connecting it back to its author?

It’s actually sort of entertaining to test the limits of this on yourself. I tried to find original media and references from a band I played with circa 2002. We were being ambitious with our publicity efforts, and consistently pumping audio, video, and images onto whatever nascent services were available at the time (from memory I can recall CD Baby, LastFM, Craigslist, miscellaneous forums, and towards the end, MySpace). I had already been designing websites for several years by that point, so we had a website, one that wasn’t just a Geocities/Anglefire template. That said, I am pretty sure that was at the height of my career with Macromedia’s Flash and ActionScript, so no real surprise that it didn’t get Archived in any functional form.

One strategy that my own experience has found quite effective is to avoid using unique or unusual identifiers. If you’re named something like Arthur Dent it is going to be considerably more difficult to find and associate information than if you’re name is Zaphod Beeblebrox. That’s obvious, but it extends to everything else, from usernames to product brand preferences—if you stick to the middle of every given bell curve then your needle will necessarily reside in a much larger haystack. The few things that tend to be unique, at least when correlated with things like timelines or location—things like telephone numbers, email addresses, usernames, account numbers, etc.—can usually be effectively obscured one way or a another. The things that can’t (government ID numbers) then become crucial to keep private. Except, at least one of those creepy services (TLOxp) was built by one of the three main credit rating agencies and so almost definitely has your social security number already, and has been attaching it to all manner of data for several years, all while also selling it off to anyone with a budget (not to mention losing it outright to hackers), so any concerted efforts to conceal oneself seems almost certainly doomed. It’d be an ideal problem for national governments to address using consumer protection laws and privacy regulations if it wasn’t also in our best interests to protect ourselves from said governments.

Sorry for the essay, this line of thought evidently yanked a pretty intertwined thread for me.