Hacker News new | ask | show | jobs
by chaosjevil 1098 days ago
There's a second response: to not reward AND punish the offending party.

I've been recommending people to replace their content in Reddit with literal gibberish (from random generators), and then delete their accounts. Each person doing this makes Reddit data less valuable for LLMs, and eventually it means that not even Google, Amazon, Microsoft etc. would ever bother paying for API access.

1 comments

I sunk 15 years of myself into reddit. I am there in all my teenage angst, and you get to see me mature before your eyes.

I can't just delete. I've helped people there, and been helped. If my data is part of a corpus that betters humanity, than so be it. Hopefully one day that corpus will be released under a FOSS license.

I can relate to your intentions, but keep in mind that, when you leave your data there, you're effectively encouraging people to keep using a user-hostile platform that will likely wall those people's content in. In the long run, you aren't making humanity better - you're worsening it.

Instead a better approach is to migrate whatever you deem useful in your Reddit history to another platform. And then remove it from Reddit, either by deletion or replacing it with gibberish.

You might also be interested in this text, as food for thought:

https://karl-voit.at/2020/10/23/avoid-web-forums/

It has been shared a few times here in HN, so I believe that plenty users here know about it.

The thing is that there's a lot of valuable information that I don't think we should just delete like that.

When you search about a Neovim issue for example, often the solution is in an old Reddit thread. When you delete that, it will be gone forever.

Even more valuable info will be generated, and I don't think that we should just cram it there like that. We would be exposing this new info to information loss.

And someone might say "I won't post it there", but once the person leaves some info in that site, they're encouraging others to interact with it, and generate more info there, in that walled garden, instead of somewhere else.

Note that the potential of information loss in Reddit does not come just from users deleting their stuff. It's also mods (including automod) and Reddit Inc. itself. One day Reddit will decide "we're going to flush out old content!", and here goes your info anyway, no matter if you deleted it or not. Or Reddit itself will go off, and the info in it will be lost, just like the info from the forums that Reddit itself killed. That's the main reasoning in the link that I've provided, and you know what, I think that the author is 100% right.

Also note that there are ways to reduce the information loss. People can - and IMHO they should - migrate that info, before removing it from Reddit.

I think that you're looking at the info present there _now_ in a short-sighted way, without realising the consequences elsewhere.

EDIT: and as another commenter highlighted everything has been archived already. The info loss will be way, way lower than you think.

>The thing is that there's a lot of valuable information that I don't think we should just delete like that.

Agreed. In the past 2 days I've been overly annoyed as both days I've had multiple google queries dump me to seemingly useful threads that I can not see because of this foolish nonsense of shuttering communities in "protest".

Export it, put it on your website. If you ever want to "own" any content on the internet, a website is still the closest thing.
It already is, and everything through March 2023 was archived by pushshift and there are torrents floating about.

It's about 2TB in zstd compression, so finding the needles will be interesting - but it'll probably be much easier years from now.

Wow. Just... wow.

Thank you for this info, it's fucking great. This means that the info loss for whatever came before March/2023 will be exactly ZERO.

I’ve once held a similar view, but then realised that kind of data really isn’t worth holding on to. I deleted my 15 years worth of posts and account, and felt a massive weight had been lifted. Reddit had a negative effect on me, and freeing myself of that was well worth it.

I should also mention that Reddit archives exist, so those post will live on somewhere.