Hacker News new | ask | show | jobs
by skrebbel 4732 days ago
To all the skeptics in threads like these: asking that Facebook actually, entirely, erase all your data isn't a reasonable demand.

Unless, if course, you're also OK with Facebook's walled-garden, Facebook-is-the-Internet, Compuserve-wasn't-so-bad strategy.

Anyone who ever tried to delete content from the internet knows this: The Internet Archive has long since made a copy of what you're trying to delete. Anything you post on the internet, the real, open internet, is forever.

If you don't want that, then you'll need to accept that a single entity has full control over what happens with your content, locks it behind a log in so that it can't be easily mined, and does with it what it pleases.

I'm not willing to accept that. I'd rather that my content belongs to everybody than that it belongs to Facebook. But you can't have it both ways.

6 comments

I agree with your general point but your example is not correct

The Internet Archive makes it very easy to remove content from their site. I had a family web site accessible to the public for 12 years with about 25,000 pictures on it. For various reasons I took the site down but archive.org still showed the pictures. A quick change to robots.txt stopped that though. Basically, even if the archive grabbed the files at a point in time they periodically check to make sure they still have the right to those files in robots.txt. If they don't, they won't display them. I was vey impressed with them when I learned this.

It's also IMO a reasonable question what the "right" behavior should be in such a situation. I'm tempted to make the argument that once content has been made available to the world and archived by the Internet Archive, my descendants or a new corporate owner shouldn't necessarily have the right to remove that content from public view at some time in the future.
I'm in this camp. I owned a 4 letter TLD that I was first registrant on in 1994 and held it until I sold it in 2001. I had lots of interesting things published on the site and as soon as the new owner took the domain, he put up a robot.txt blocking the site my years of content disappeared from the archive. :(

I keep toying with the idea of trying to buy the domain back but it's value has become somewhat prohibitive. Maybe when I win the lottery :/

Make sure you maintain control over the domain and always have a robots.txt file present, because if you ever lose that, those files will become visible again. Good luck arranging for your descendents to do this after you die.
"If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org."

http://archive.org/about/faqs.php#2

just mail them, they are nice people.
it would be nice to have some standard method to delete the content.
Emailing them is the standard method.

Sometimes, you know, you've got to, like, talk to people.

I did this with my old website and it worked great.
I'm pretty certain that this isn't the case. If anything, it's the other way around - if a domain changes hands, and the new owner sets up a restrictive robots.txt, the content in the archive is made unavailable, and stays unavailable even if the robots.txt (or the whole domain) later disappears.
This has been my experience as well. Once the content is deleted, it's gone.
This is not true.
It's perfectly possible and reasonable to delete all of your data.

It's just hard.

We have to do this with our clients if they shift off our platform and believe me, it's not much fun deleting 20-100Gb datasets from a shared database with over 2000 tables in it on production kit.

But we do it, because we are honest.

Facebook are dishonest. Simple as.

Hmm, I'm not sure what you mean by "our platform", but if I post a youtube video, it gets popular, but I get ashamed of it and decide to delete it again, the video will have spread far and wide to other video sites again.

Some for public web pages and the internet archive, for stackoverflow answers/wikipedia entries and SEO rats, and so on.

How is "your platform" going to help me delete my embarrassing drunk student video from the internet's video sites?

The only way "your platform" can do this, is by actively working to block public access to that video in the first place. Then you can have fun (or not) deleting those 20-100Gb datasets. My point is that that means accepting that you're posting stuff to a walled garden.

It sounds like his platform is perhaps a private enterprise platform? In other words, it doesn't sound like Facebook or Youtube where information can easily spread, which is probably a prerequisite for having control over your data.

The company I work for deals with background checks and screening information on behalf of our clients' clients. I could definitely see us safely removing all of that personal information from our system and not being able to recover it. But at the same time, we're a much much much smaller organization compared to Facebook.

     *Anything you post on the internet, the real, open internet, is forever*
I've tried looking for stuff I posted on craigslist 5 or 6 years ago and nothing was in the wayback machine, only dead links to a few posts.

Apparently the crawls of the site only went a couple/few layers deep, as well as crawls not being conducted daily, and some years had only 2 or 3 crawls for the entire year.

I haven't tried searching my old facebook posts yet though, maybe some sites are more thoroughly catalogued than others.

Perhaps some other entity is storing my old posts, but I doubt that the NSA will allow me to access my own data.

This doesn't mean someone else didn't archive the material.

Though it might not be generally / publicly available. Could still be on sale somewhere.

Do you also delete their data from all your database backups?
Yes. They are cycled out after a week. Our data is useless after then as most of it is real time. Audit logs are kept offline in gzipped daily text files per client and these are handed over to the company and deleted.
Not sure I understand. If Facebook were federated, it would not be reasonable to expect it to delete your data for the practical reasons you mention. Facebook is not federated, so they presumably are able to delete your data. Maybe it's naive to expect them to behave in an honest or considerate manner, but not it's not unreasonable to ask it as far as I can see.

Edit: Of course there's nothing they can do if somebody's screenshotted your post or stored its contents in their brain, but that's not what we're talking about.

> I'd rather that my content belongs to everybody than that it belongs to Facebook. But you can't have it both ways.

You can publish your stuff on Facebook or G+ as "public"... I saw some people are doing that, using it as a blogging platform. Then it belongs to everybody but you are also taking advantage of the social elements on those networks.

No it "belongs" (in your sense of the word) to the platform used to publish it. You loose a significant part of your rights to that content (be it public or private).

The only way to "own" your content is to publish it on your own domain on a host you just pay for storage. You might then link to the content via your (so called) social playground of choice. But you will own the content in a "normal" use of the word.

You can even explicitly state a copyright-info and lay out your terms, that tell everybody that this is your creation, your belonging...

Posting it publicly on any platform that you do not own, will end in you loosing rights to your creation.

Actually the Internet Archive does not spider content on Facebook:

http://web.archive.org/web/*/http://facebook.com https://www.facebook.com/robots.txt

Just because it's not behind a login doesn't mean that you've got more control over it.