I haven't read GDPR in details, but isn't GDPR concerned only with personal/private data? The Wayback Machine only archives public pages as far as I can tell...
Yes but the article itself is user-provided content to Medium that the author has a right to ask to be deleted (under GDPR), presumably? So perhaps it will be simply a matter of the The Wayback Machine having to have a policy to delete things if requested?
No! GDPR is about personal data, which is well defined in the regulations and does not include blog posts. The right to delete data (or "be forgotten") is nothing to do with GDPR.
If the original post contained personal data, it is a different issue but if that was put out into the public domain, it is a hard problem to solve.
No, if you intentionally made that data public then it's done. GDPR doesn't, say, force you to remove political views of Theresa May from newspapers, despite that being covered by personal data, because Theresa May made those views public.
The Wayback Machine has always had a policy to delete things if requested, so there's no real change there. The most common way site owners do that is by changing robots.txt. In line with the Oakland Archive Policy [1], the Internet Archive respects robots.txt retroactively, so a site owner can get archived versions deleted just by excluding them in the robots file. Besides that, they respond to DMCA takedowns, one-off removal requests [2], etc.
Changing robots.txt does not delete content from their archives. If you remove the robots.txt file, the content becomes viewable again.
There's no scenario where they can respond to the vast scale of GDPR violations that their archive likely represents, when it comes to manually removing content. There are only three possibilities: avoid the EU as much as possible, dump the archives and start over with an entirely different approach, or shut down. Besides that, these laws are going to get a lot more strict and difficult to comply with, not less strict, over time. This is merely the beginning of aggressive regulation of the Internet. Regulation of the Internet will only move one direction from here, in the direction of increasing burden and ever greater regulation. It's hard to imagine Archive.org's archives surviving what's coming.
There's no scenario where they can respond to the vast scale of GDPR violations that their archive likely represents, when it comes to manually removing content.
"GDPR violations". What's that, exactly? As far as I know, you only have to remove personal data upon request, no preemptively. So I don't see how they are "violations".
Will a lot of people make these requests? Possibly, but where's the evidence of that? People have been able to use copyright takedown requests (e.g. under the DMCA) forever, yet the Archive is still around.
Actually the recommended data handling says you should specifically state the purpose for needing the data, and that it should be reasonably limited to that need; i.e. if you don't need it any more you should pro-actively delete it.[0]
Read the law before posting wildly misleading comments like this.
If you explicitly make something public, you can’t later come and claim that this information is actually crucial to your privacy. If so, you yourself was the one who violated that privacy, not the company later archiving/caching/processing your public article.
GDPR is all about decency and common sense wrt. user data and privacy.
No need to spread FUD about something that simple. SV proved tech companies can’t be trusted to act ethically, so here comes the regulation. Deal.
Given the immense scale of Archive.org, there must be a truly incredible number of sites & pages with personal data & content in the pages. Millions upon millions of pages, due to the repeat archiving.
Comments with usernames. Comments with ip addresses (sometimes old comment systems would allow you to comment without registering but they'd show all or part of your ip address). Comments with personal information in the messages. Comments with email addresses. Blog posts with all sorts of personal details from the author. Personal user account pages, such as the kind you see on sites like Ask.fm or similar, with vast amounts of user information and personal details that can't be deleted. And on it goes. Archive.org is storing all of that and does not allow it to be deleted. Further, it would be nearly impossible to figure out what content is compliant and what is not within the archives. It's a giant GDPR violation system. Their only sane bet is to stay way from the EU jurisdiction wise as much as possible, or shut down.