Hacker News new | ask | show | jobs
by morsch 2223 days ago
Any customer data, and especially PII, needs to be toxic. The toxicity needs to increase super-linearly with the total amount of data, because the value of leak does, too, while the difficulty of the breach probably does not.

It needs to be so expensive to store extensive data of millions of people that companies (or for that matter, the government) cannot wait to get rid of it.

Currently, most online shops nudge me towards opening an account and letting them store my data indefinitely (to facility marketing and reduce friction). They should do the opposite, nudge me towards not causing them the hassle of storing my data beyond the immediate business transaction.

6 comments

I built an app back in 2012 for emotional journaling and I tried to collect as little data as possible from the user because I didn’t want to have the burden and legal responsibility to guard all that deep data. Many people in SV told me I was crazy not to collect data. It does make it harder to develop the app with so much uncertainty about how people are using it, yet I felt much more free knowing I wasn’t one hack away from exposing people’s lives.
Why not ask them?
I'm not sure to which part you're referring...ask them what?
I believe he's suggesting that since you don't collect any more data than necessary, you ask the users how they use your app instead.
ah, yes, I do ask them when I get the chance. Sometimes it would be a lot easier to just monitor them without having to ask. I mean, I'm sure it's a sliding scale of privacy/automation, but I still like doing it more manually, or rather, with more intentional consent.
Collecting data is just a symptom of a bigger disease called "marketing". You eradicate that disease and suddenly there's no reason to collect more data than necessary, along with a near-infinite list of other quality of life improvements like the lack of dark patterns, email and push spam ("newsletters" and "offers" as they call them), etc.
In that case the data was clearly needed for the essential service EasyJet provides, it wasn't marketing data. According to the article, the data stolen was email addresses and travel details, and for some customers credit card details (for customers who asked EasyJet to save them)
I think the point is that the company keeps the data around forever when they only need for a few months after the trip, at most.
When I was applying for UK citizenship, they needed 5 years of all travel data (entry and exit into the UK). EasyJet's databases (and other flight providers') provided useful. Of course, you could wonder why the government even needs this data, or why they don't keep it themselves (I don't for a moment believe that UK government / spy agencies doesn't have all data about all flights in and out of the UK).
This is how I’m building my startup[1]. All data stays with the customer and we actively don’t want it, because that’s how I wish all my products worked. I suspect you will see more startups who treat data more respectfully in the future, as the next wave of founders have experienced the consequences of unrestricted data collection.

Having said that, I also think a large part of the problem is that treating data like toxic waste is hard. There are more established patterns for data collection than data destruction. How do you know when it’s safe to delete some piece of data? What if the user comes back and complains about a transaction after you’ve deleted the associated data?

1: https://hiome.com

Exactly. A lot of businesses MUST keep the data.

Imagine EasyJet putting the burden of keeping all your transaction logs on you: "Passenger assumes responsibility of downloading this electronically signed package and keep it for 2 years"

On a completely tangential note: How does your product work with pets?

Ha, that makes me wonder if we could have a future standardized protocol where your browser handles the responsibility of storing a signed package of data, and sending it back to the company when needed. Basically treat each package of data like a product that might need to be RMA'd if there's an issue. Obvious first question is what happens when you switch browsers/devices.

Regarding pets: it'll depend on the size of your pet. For most people, the sensors properly ignore pets, but they can be confused by large dogs. You can adjust the sensitivity of the sensor, so it's generally only an issue if you have both large dogs and small children, and only want to count one of them. We're working on a software update that should help that scenario too. Feel free to send me more questions at neil@hiome.com :)

The government wants many companies to keep certain data, to prevent fraud by the customers (and sometimes the businesses). Decentralizing the data makes such frauds (including tax fraud) more difficult to audit or detect, so it seems unlikely that governments will permit it.
But wait, isn't this exactly how MOST businesses operate today? I certainly can't go to my local dry cleaners and request the transaction data for something that happened 2 years ago, much less any sort of metadata about that transaction (3 shirts, one blue two white, no starch). The normal principle most businesses adhere to is a strictly limited time period of "memory" of any particular transaction or interaction, after which it is solely the customer's responsibility to keep records.
> There are more established patterns for data collection than data destruction. How do you know when it’s safe to delete some piece of data?

I agree, but this is exactly analogous to the SDLC. Most coders only learn to hack together barely-working code. Those who spend the effort to learn the craft figure out how to {version control, unit test, static analysis, benchmark, integration test, upgrade library dependencies} and automate these processes.

Similarly, there needs to be a data lifecycle with defined retention lifetimes for different data, defined processes for actually disposing of data, and special handling cases for backup blobs (which may be retained longer than the retention lifetime of a subset of the data in the backups). This is effectively intended by the GDPR (not sure if it states explicitly) and similar laws.

Startups now have to think about things like GDPR and Cali's laws, so they have to think about this data more -anyway-.

> I also think a large part of the problem is that treating data like toxic waste is hard.

Yep. It's a -lot- of extra work to do. It's a balancing act between:

- Keeping data long enough to satisfy govt regulations, rulings, or existing contracts with your vendors (i.e. merchant account with a bank for CC processing.) You can't just order something from amazon, Send a GDPR request and expect all your data to be gone; They can't delete it until -after- those retention periods have expired.

- Following Regulations like GDPR/Cali Privacy law.

- Still doing meaningful things with the data.

Generally speaking, I'd say this is all stuff that makes a Data architect very handy in the modern climate.

I think the saying is that : 'data is not the new oil, it's the new uranium'
Exactly, just let my browser fill in the data. And we pray that Mozilla can keep it safe :)
I don't see how that helps more than marginally.

Doesn't PCI require a payment processor to keep some amount of the transaction data for a specific period of time?

Personally, I love tokenized transactions / specialized payment processors (eg Apple Pay, Stripe, PayPal) because they actively work to keep most of the data away from etailers (who are generally not specialists in securing their checkout flow). The problem is the payment processor transaction fees can be steep (2.x% for most commodity CC processors all the way up to 15% for Apple Pay), so etailers lose on the margins and avoid the more secure options.

I think the GDPR has gone some way towards that.

Certainly in my own business, I want as little PII as possible.