Hacker News new | ask | show | jobs
by speedgoose 1524 days ago
If you want to delete your Reddit account from the pushshift Reddit dataset, you have a Google form: https://teddit.net/r/pushshift/comments/pat409/online_remova...
1 comments

> Right now, we internally blacklist the account so that the data is not exposed via any public API. For full disclosure, we currently do not permanently delete any data unless there is a major issue involving PII, etc. While you have the right to request that people cannot search your comments and submissions via the public API, we reserve the right to keep data in our private archive so long as we never allow any data that you requested be removed get exposed through any public API endpoints.

Your deleted comments even after requesting to be removed from pushshift can still be found in the archives e.g. what https://camas.github.io/reddit-search/ uses

> we reserve the right to keep data in our private archive so long as we never allow any data that you requested be removed get exposed through any public API endpoints.

This is why I'm really happy to have GDPR. If this were a European company someone would eventually tear them a new one.

"Reserve the right" is a cheeky choice of words for keeping data that isn't theirs against people's consent.

> "Reserve the right" is a cheeky choice of words for keeping data that isn't theirs against people's consent.

How is it "your data" if you use their property (reddit.com) to post content on their website? IP and other PII I can understand, as reddit gets that no matter what interaction you do with the website, it makes sense to protect that. But as soon as I make this comment on HN, I don't really expect to "own" this comment anymore, it now belongs to news.ycombinator.com.

When you make a post on Reddit, you give that site permission to publicly display and retain it. However, what is being talked about here is a third-party scraping that post and keeping it without your permission, the permission that you did grant to Reddit.
If I post to Reddit, I'm giving Reddit permission to display that comment. I haven't read the T's and C's, but my expectation would be that I retain copyright over my own comments.

I could well be wrong (probably, even :), but my point is that giving permission to use my comments on one site shouldn't automatically give carte blanche to anyone and everyone to use them.

Well, you almost certainly give reddit a free, unlimited, global, transferable, [...] license to reproduce your comments. That's just standard, and it's unclear how they could display your content otherwise. Whether there is any way for you to revoke the license (either in the T&C or in applicable law) is a different matter.
Any sane tos gives them irrevocable, royalty free use of content you post.
Not according to GDPR and its right to be forgotten. The data subject always has the last say about data that concerns them.

Just because data about me exists doesn't give unlicensed third parties unlimited rights to it.

Good luck with enforcement.
That has no bearing on whether or not its right or legal.
> But as soon as I make this comment on HN, I don't really expect to "own" this comment anymore, it now belongs to news.ycombinator.com

  In Article 17, the GDPR outlines the specific circumstances under which the right to be forgotten applies. An individual has the right to have their personal data erased if:
  
  The personal data is no longer necessary for the purpose an organization originally collected or processed it.
  An organization is relying on an individual’s consent as the lawful basis for processing the data and that individual withdraws their consent.
  An organization is relying on legitimate interests as its justification for processing an individual’s data, the individual objects to this processing, and there is no overriding legitimate interest for the organization to continue with the processing.
  An organization is processing personal data for direct marketing purposes and the individual objects to this processing.
  An organization processed an individual’s personal data unlawfully.
  An organization must erase personal data in order to comply with a legal ruling or obligation.
  An organization has processed a child’s personal data to offer their information society services.
  
  However, an organization’s right to process someone’s data might override their right to be forgotten. Here are the reasons cited in the GDPR that trump the right to erasure:
  
  The data is being used to exercise the right of freedom of expression and information.
  The data is being used to comply with a legal ruling or obligation.
  The data is being used to perform a task that is being carried out in the public interest or when exercising an organization’s official authority.
  The data being processed is necessary for public health purposes and serves in the public interest.
  The data being processed is necessary to perform preventative or occupational medicine. This only applies when the data is being processed by a health professional who is subject to a legal obligation of professional secrecy.
  The data represents important information that serves the public interest, scientific research, historical research, or statistical purposes and where erasure of the data would likely to impair or halt progress towards the achievement that was the goal of the processing.
  The data is being used for the establishment of a legal defense or in the exercise of other legal claims.
Source: https://gdpr.eu/right-to-be-forgotten/

I wonder how all of this works out when you're taking public comments from someone else's forum and re-hosting them. Someone who specializes in law or even is more familiar with GDPR and related laws might be able to comment in more detail. I'm just linking the source which attempts to clarify these things, hopefully those are useful.

It says it's using the pushshift api on the page.
Don't know what to tell you, I tried opting out a year ago [1] and they still have some of my deleted comments today [2]. Pushshift is a massive GDPR violation so I'm surprised they haven't gotten into any trouble yet.

[1] https://imgur.com/a/Gjp7YOA

[2] https://camas.github.io/reddit-search/#{%22author%22:%22trin...

Report them to the relevant authority if you can. Assholes like this need to be stopped.
They get reported and fined €10m. However, if they're not European or have EU-based assets, what makes you expect they'd care or ever pay?