Deleting comments, ugh, please don't do this :( If you must, make a throwaway account instead. It sucks to find a thread that promises to solve a problem you have, but when you go there critical context is missing due to deleted comments.
Users should own their comments and should be able to delete them as they see fit. Sites like Reddit should not be used as a permanent or reliable source of information.
Useful to others because they search reddit is fine (even though I disagree with Reddit's app policy). Useful to OpenAI to train ChatGPT and dispense as wisdom people pay for?
I can totally respect someone trimming all their online footprint to avoid that.
I'm making a tangential statement only somewhat related to the comment above it. Personally I don't like that sites like Reddit claim ownership of and make profit from user generated content. I think people should delete their comments and move to a more open and free platform.
Should be mentioned the rights are retained by the user. Reddit only gets license to reproduce that content and allow others to do the same. Although also being perpetual, irrevocable, it's similar to ownership cannot say they're the same thing.
A lofty ideal, but unpragmatic if you actually care about privacy. You have no control over how information disseminates once you publish it. You must assume it'll remain on the Internet forever. Even if you assume the existence of first-party deletion tools, third parties like Internet Archive or (in Reddit's case) PushShift can choose to preserve it.
If you want to retain your online anonymity, you have to be thoughtful about what you share online.
There's one kind of privacy where someone has an archive of all reddit comments ever and looks you up.
There's another kind of privacy where your cousin sees your reddit user id and looks up your comment history and finds out you are [_____insert secret here___].
Deletion of your history is a protection against the second kind.
Despite the hubris and stupidity of Reddit's management, it's still a longer lived repository of information than most people's blogs or web forums. It's also more accessible than mailing lists, Discord, or other social media sites. It's far from perfect but at least it exists.
I can't imagine Voltaire claiming any right to be able to delete a pamflet. I think we need to have forgiveness baked into our social layer more, so that forgetting isn't the default.
It is not great, but it is better (if you wish to not be identified) to keep it in fewer perhaps non-public or poorly indexed archives. The internet does forget, sometimes.
That the internet never forgets is a myth. It forgets surprisingly quickly and I think will only accelerate. Sure, someone may have a copy of something somewhere, but discovering it via any search engine is impossible.
Personally I put a lot of PHP and VB code out ~20 years ago that I could find easily until I couldn't. There's Myspace profiles I've tried to pull up, images posted by friends 15 years ago in random places. Early video. All gone.
I generally agree. It depends how Internet famous you were, how hard a person looks, how common your name is, whether there's something specific you're looking for about a person. But, yeah, I'm willing to bet that a lot of random casual searches wouldn't turn up some Internet scandal/controversy around a non-famous person unless you really knew what to look for.
I agree, although trends for whether this is becoming more or less the default is not quite as clear in my mind. On one hand, the sheer mass of content being generated every day has become exponentially larger (I wonder if this has begun leveling off at all?), so there’s more to index and presumably more noise with which to conceal a signal. On the other hand, data science has progressed, storage media is cheaper, and everything is much more accessible; As a result creepy services like LexisNexis, Palantir, and TLOxp have all become vastly more sophisticated in their ability to retain and analyze data that they can pretty effectively associate with specific people/organizations.
I’m not sure which factor is more influential—the ability for data to persist, or the ability for it to then be found and interpreted. Would it matter if the content was still available in some forgotten corner of the internet if there weren’t effective tools available for finding it and connecting it back to its author?
It’s actually sort of entertaining to test the limits of this on yourself. I tried to find original media and references from a band I played with circa 2002. We were being ambitious with our publicity efforts, and consistently pumping audio, video, and images onto whatever nascent services were available at the time (from memory I can recall CD Baby, LastFM, Craigslist, miscellaneous forums, and towards the end, MySpace). I had already been designing websites for several years by that point, so we had a website, one that wasn’t just a Geocities/Anglefire template. That said, I am pretty sure that was at the height of my career with Macromedia’s Flash and ActionScript, so no real surprise that it didn’t get Archived in any functional form.
One strategy that my own experience has found quite effective is to avoid using unique or unusual identifiers. If you’re named something like Arthur Dent it is going to be considerably more difficult to find and associate information than if you’re name is Zaphod Beeblebrox. That’s obvious, but it extends to everything else, from usernames to product brand preferences—if you stick to the middle of every given bell curve then your needle will necessarily reside in a much larger haystack. The few things that tend to be unique, at least when correlated with things like timelines or location—things like telephone numbers, email addresses, usernames, account numbers, etc.—can usually be effectively obscured one way or a another. The things that can’t (government ID numbers) then become crucial to keep private. Except, at least one of those creepy services (TLOxp) was built by one of the three main credit rating agencies and so almost definitely has your social security number already, and has been attaching it to all manner of data for several years, all while also selling it off to anyone with a budget (not to mention losing it outright to hackers), so any concerted efforts to conceal oneself seems almost certainly doomed. It’d be an ideal problem for national governments to address using consumer protection laws and privacy regulations if it wasn’t also in our best interests to protect ourselves from said governments.
Sorry for the essay, this line of thought evidently yanked a pretty intertwined thread for me.
I agree with your critique that stylometry overrides throwaway accounts.
The issue is "deleting."
AFAIK, nobody's disagreeing that the internet can forget and deleting is "better" for privacy.
The question is "how much" better.
Here we're talking about the false sense of privacy through erasure when archivers should be assumed to be running everywhere and at all times. The latter weighs in favor of the parent's critique that deleting is not constructive.
> I wonder if one could hook up a browser extension to do this
It seems totally feasible. Though I think it would be far more interesting to make a purpose built anti-stylometry tool, that explicitly tries to analyze for and mute the signals stylometry uses.
Edit: what I'm talking about is apparently called "adversarial stylometry":
> All adversarial stylometry shares the core idea of faithfully paraphrasing the source text so that the meaning is unchanged but the stylistic signals are obscured
Having an LLM rewrite a comment would do this entirely, no?
Are you just interested from an academic perspective how one might build something more surgical, that only changes some words in a comment?
> Having an LLM rewrite a comment would do this entirely, no?
> Are you just interested from an academic perspective how one might build something more surgical, that only changes some words in a comment?
Yes and no. ChatGPT seems like a blunt instrument, and given it's not purpose built, it could miss certain characteristics that could enable identification.
Also LLMs kind of have their own style, and adopting that particular style is likely self-defeating to getting a message out (e.g. I would tend to ignore something that sounded like it was written by ChatGPT). Manual correction then be needed, but it would be hard with such a system, because I think that would tend to re-introduce the author's style.
Using a local LLM could work just as well if all you are doing is asking it slightly change existing text. The concept itself paired with a throwaway account(s) seems to be better than other alternatives like just deleting everything.
Stylometry can link long-lived identities, but can't do much if every throwaway only posts a couple of comments. There needs to be enough content to analyze. Especially if the overall community is large, like Reddit.
Is there no automated way to defeat stylometry? Software that rewords and restructures writing so that it becomes unattributable? AI should be able to do it.
I delete my comments on reddit because it's such a toxic place. Trolls will read through past comments and twist them to harass even more. I purge all my comments that have low upvotes and leave the ones other people find useful. But a lot of the time downvotes don't mean you are wrong or said something mean, it's the just stupidity of the masses or other people being mean. It's a love/hate relationship with that site.
Somehow, I can't imagine how HN is any better. Reasonable comments are mass-downvoted when people disagree with you on seemingly hot button topics - blm, vaccines, musk, trump, immigration, lgbtq etc. I know many people who had to create new accounts because of this.
Merely being downvoted isn't a form of harassment. The worst people on Reddit will DM you vile shit/threats and stalk you across various subreddits if they think you're a person worth targeting (particularly if you belong to one of those so-called "hot button" marginalized groups). Meanwhile HN doesn't even have a DM feature.
If you express a reasonable opinion that is even mildly positive, or even ambivalent about musk or trump you get mass downvoted, and people assume you completely lack integrity. I have seen this exact behaviour on HN for years, and the mods do nothing, and the bad-actors are often the long-term users with tons of karma. I'm not into conspiracies so I won't speculate as to the reasons.
Anyway, that in and of itself doesn't bother me - The parent was trying to present a "holier than thou" attitude towards Reddit. I merely pointed out that HN is not all that great when it comes to that.
>mildly positive, or even ambivalent about musk or trump you get mass downvoted, and people assume you completely lack integrity
With the amount of time musk and trump have had in the spotlight and the awful shit they've objectively actually done, it should make people question your integrity if you are writing positively about them. Like it or not, downvotes are an expression of disapproval.
Most elections are about electing the least-bad person. Bush started wars, Biden/Obama dropped bombs on civilians (even US citizens), tortured detainees, trump.. well we all know what trump did. Most voters either voted for biden or trump, both have done "awful shit". Every politician has done "awful shit".
If you can't find a single positive impact of trump's (or any politician) policies or a single positive thing musk has done, consider that you maybe be in an echo chamber yourself, or you're just deluding yourself into thinking you have some kind of moral high-ground.
I assume people downvote you about Musk and Trump because they disagree with your takes and think you're not contributing anything interesting to the discussion. People are allowed to downvote. What would you propose mods do in that situation?
Honestly, "everyone" knows there are acceptable boundaries of opinion in a lot of circles generally--including this one--that even nuanced takes cannot cross without bring out the knives. By and large, it makes sense to accept that and move on. You likely won't change anyone's mind and you'll just get upset.
Sure, someone could have a bad take, could be simply wrong, or simply have a different opinion. Why do you assume that I cannot differentiate between them?