Hacker News new | ask | show | jobs
by willemlabu 1972 days ago
This goes against my internal logic about personal privacy. The solution to online privacy and data mining is not collecting it all in a central repository, it is not collecting it at all.

Further, and I realise this will come off as alarmist, but, what then if the software suffers a 0-day? All that data will then be nicely aggregated for a bad actor. Somehow, knowing that there is perhaps a non trivial amount of work to be done to collect data and compile a profile from many different sources feels safer than putting it all in one place.

7 comments

Agree - your comment makes me think of a tweet[1] I read recently:

"If you're collecting personal data, 'how should I protect this?' is actually your third question.

'Should I collect this?' is only the second question.

The first question is 'what would the worst people do if they got hold of this?'"

[1] https://twitter.com/eey0re/status/970144255745212416

so what will facebook do with it once they bully/buy the platform
> All that data will then be nicely aggregated for a bad actor.

If I were to host my own backend for personal data on server/platform XYZ and there is a 0-day for platform XYZ, the bad actor need actively search out my server and get my data from my server. But a nicely structured datadump is not particularly valuable if it's just one person. So you need to hunt down all other instances of XYZ and aggregate all data to get something someone would like to pay for. But this aggregation is stale and when xyz is patched and months have passed you just have gigs of data that has gone bad, and just like rotten fruit that wont sell for much. So i would say, in practicality, given enough decentralization, and a lot of competing platforms, the hypothetical bad actor in this scenario is much worse off than the non-hypothetical bad actors we have running around and fucking with our data right now. FAANG et al.

You make a good point. PDS solutions aim to get rid of "big data", and centralised data lakes that can be queried. It's not inherently a bad idea, but:

> a nicely structured datadump is not particularly valuable if it's just one person

This very much depends on the person.

> But this aggregation is stale and when xyz is patched and months have passed you just have gigs of data that has gone bad, and just like rotten fruit that wont sell for much.

Not really. First off, I would imagine it would be possible to script finding people's servers and scraping it for data. Ultimately these servers will have to be hosted somewhere and systems like masscan make it easy to rapidly find servers hosting software that you can exploit. What's more, now the person is responsible for this risk level. Sure, a couple of experienced sysadmins like myself or you would know how to secure our data and make the server difficult to scan or probe, and difficult to access in the worst case, but how many users are actually going to be able to put in the time to learn system administration, to ensure that a server they are hosting is secure? It takes a lot of work, especially if you do not know the first thing about computers.

The end result of this will drive the introduction of businesses whose responsibility is to host these servers, and now you are back where you started, except worse! I can reasonably assume that just because my welfare data has been breached, that does not mean that they could access my medical records. Now however, that is not the case!

Secondly, even data that you would assume is stale, can be important and viable. Old phone numbers, for example, are still valuable as they can be used to construct a history for the given person, and often identity confirmation procedures require listing old information along with new information (A friend recently had to list places they had lived at to confirm their identity, which meant that they were unable to confirm because it was requesting a full list of addresses they had lived before they were ten (!)). Databases like Medical Records or your National Insurance Number do not tend to lose their value just because they aren't from this year data. Often old security questions and passwords are just as valuable as new ones, old information can be used to construct a 'good enough' profile and either used to sniff out newer more viable information, or used to aid the rapid generation of possible and likely passwords, among other things.

Thanks! Very valid points, I left out all the nuances to get some counterpoints and yours are very valid. I think the biggest issue, as in most federated/decentralised scenarios, is the inevitable(?) backend/server hosting providers that will crop up. In this case there would be very large incentives to try to provide "easy solutions" that hide the technicalities allowing for loopholes to aggregate and sell data. The individual datapoints might be encrypted but you might monitor what kind of data consumers are attached to the PDS and based on how much activity the consumers generate aggregate and sell data about eg. users with many/active fitness related data consumers and target these users with ads about fitness equipment.

Disclaimer: I couldn't really grasp how Personium works from the "app screen demo" but it didn't stop me from commenting...

> The solution to online privacy and data mining is not collecting it all in a central repository, it is not collecting it at all.

Am I misunderstanding this?

It seems to be an open source, self-hostable server, not a central repository?

You'd be concentrating your own data on your own server, which means you're one step away from someone capturing all your data.
Oh.

Classic Internet then, if it can't be perfect why bother at all. Let's just complain and look at all those stupid people sharing things on Facebook?

Alternatively we can set up our own servers that won't be monetized by Facebook or Google but since they can potentially be broken into, why bother?

Or do you mean we should keep our data on hard drives in a safe and plug them into an airgapped computer whenever we want to look at photos or listen to music?

This is a bit harsh but this is an important topic and at the moment I cannot come up with a better explanation.

I'm not advocating for anything specific here. :) And frankly, I take your point and actually agree wholeheartedly. The status quo of data mining and tracking is terrible, and leads to exactly what you're talking about: people changing their behaviour (not just online) because they feel like they're being watched[1].

I realise I'm not providing a solution. I wouldn't even feel confident at pointing a general direction. I'm merely pointing out that I don't believe the right way to solve this problem of personal data aggregation is consolidating all this personal meta-data into a single spot.

[1] https://www.socialcooling.com/

Ah, ok.

I have a couple of ideas (and/or can both be applied here to some degree):

- improve vpn to the point that people can and will use it to browse their photos.

- make hardened login solutions, run services behind that

- local hosters, stronger data protection rules

- fringe benefits at work or as part of union membership? (I admit I don't like the lock in aspect of this)

- make software local only by default

etc

There are data brokers -- legally operating companies -- which already collect personal data of entire populations in a central place.
I understand this, and there's an argument to be made about ethics here.. but, these companies do have a non-trivial amount of work to do to collate the data into profiles. There are also ways of making the attribution of this data to an individual more difficult for these companies.
Yes, I agree. Creating more distributed forms of risky activity, doesn't make the activity much less risky. It just introduces new attack vectors. The answer is to not use services that collect data.
That's my opinion too. In order to secure your data, don't give it out. And don't store it on a server. As you know, servers get hacked.
So what do you store it on then if not your own server?

External drives in a safe?

Also: Houses do sometimes get broken into too, but most of us still prefer to live in one.

100 times yes - agreed.
So, you guys never take digital photos (yes, I know, gentlemen take polaroids), create any digital documents or keep any logs in your computers?
Of course. :) I don't mean to patronise, and I hope you're not trolling, but I feel like you're missing the point a little. Using a "cloud" to store potentially sensitive documents or information is not comparable to collating your digital movements, usage history, habits, purchase information, etc.

It's not just about controlling the information, it's about collection in the first place. I want to store photos, and I need to store documents that are sensitive in a place that I can access them easily, and securely. These things are important to me. To me – individually – because they relate to me.

Collecting information about my online activity is not important to me. It is, however, important to advertisers, data brokers, and other players on some arbitrary scale of nefariousness.

But you would just collate the data you want to manage on this service, correct?
Frankly, I wouldn't collate any of my data on this service.