Hacker News new | ask | show | jobs
Privacy is an afterthought. Here's how devs can easily make it better. (stackoverflow.blog)
85 points by c1ll1an 1794 days ago
9 comments

Quite a complex approach, where most will be able to gain a lot with some very simple actions:

- dont collect data you do not absolutely need to service the user

- do not use third party libs or services, where you do not understand how they handle the data you submit to it

Completely agree pintxo, doing these basic things goes along way to ensuring there's a privacy first mindset when building anything. The question I’m looking to answer as we work on this problem at Ethyca is how do we make it easy for any developer to bake that mindset into what they’re doing when they’ve got a bunch of other objectives and often understand the detail of data minimization as a concept simply isn’t their area of expertise. Rather like the boundaries of security, we all need to do it, only some are subject matter experts but we all have to fold more security thinking into our work. I think we can make that far easier for every developer, whatever part of a system they’re working on.
The data collection part IMHO is a prime responsibility of the product owner. She needs to clarify what should be collected and also what should NOT be collected.

While the latter part is the prime responsibility of the developer team. You need a culture of skepticism towards 3rd party access to you (customers, users, company) data.

That second point is very interesting. Beyond reading code / SLA for the lib, I'm not sure there's an easy (read: time efficient) way to understand what data points are used for what purposes currently. At least it seems that would hold for a lot of services.

Am I missing something here?

The easy way is a Data Processing Agreement, which has to precisely list what data is processed which way.

This is of course a legal document and the implementation may do something else.

Right - you've nailed it, a legal document like a data processing agreement may be enforceable in court but system implementation can vary widely, often without malice but it still fails.

So the question to answer is how can we ensure an interoperable contract for data between systems/services - that requires an ontology for privacy that makes enforcement easy(er).

It is possible to make privacy definitions a declarative and low effort part of development for engineers - then code becomes the enforcing layer instead of legal agreements.

Considering blatant GDPR breaches by Google and Facebook such as their non-compliant consent flow have gone unpunished I would not trust a legal document when there’s previous evidence that you can break the law and successfully get away with it.
Exactly Nextgrid - this is on developers to solve. Data flowing through a system isn't policed by the legal agreement, it's developers who understand where/how data is being used - we're the ones who can fix this.
It basically boils down to: use as little third-party services/code as possible. Because vetting them is expensive and error prone (usually not enough insights to confidently judge). Prefer code over services.
In my experience the devs usually aren't the obstacle when it comes to implementing privacy, it's a bit higher up the chain...
I dunno, machine learning engineers were a key force against privacy at one company I once worked at. As an ML engineer, your whole career sort of depends on convincing the company you work at to collect as much information as possible from their users, so that you can then run it through magical algorithms to make it look like you're providing value to the company. (There are exceptions, but entire subfields like "recommendation systems" depend on this--startups aren't hiring ML engineers to get better at playing chess.)
I've also seen plenty of projects which did not use proper authentication and authorization because devs were too lazy to implement it or to learn the platform they were developing on (e.g. AWS). It's an indirect effect on privacy, but when it hits it hits hard.
Fair @isbvhodnvemrwvn - I'd say it's a combination of laziness, lack of knowledge perhaps or both?

I'm conscious that it's easy to say "do _____ better" and insert a soap box like privacy or security; but if you don't provide developers with tools, whether that's libraries, IDE plugins, linters or code analysis tools to make that task easier, it's almost impossible - can't ask developers to be privacy experts, but we can help them to make that expertise readily available.

Like the example you took - provisioning cloud infrastructure is a heck of a lot easier now that it was a decade ago because of a bunch of orchestration and infra as code tools that ease the burden/knowledge gap for developers. We've got to have the same for anything we care about - in my case, that's privacy.

Really good point csande17 but I'd be nervous to tar all ML folk with the same brush :) I know some incredible data science and ML folks who care deeply about privacy - check out http://openmined.org to see hope in the ML community. There are large numbers of talented engineers who are dedicated to building better data processing systems.

If I can offer my two cents: if we do better at annotating data at source of ingress (whether that's data provided by users, inferred about them, etc.) such that we can better describe the data we hold, why we have that dataset and for what limited uses, we can then enforce those conditions on models - right now that additional context just doesn't exist so privacy type enforcement on ML becomes arbitrary and subjective based on a teams needs. We can do so much better if just describe what, where, how and why we've collected data in our systems - then enforcement is layered on top of that.

Interviewer: Do you have any questions for us.

Candidate: Will this job require me to violate anyone's privacy.

Hold on, how about this instead.

Candidate: Will this job require me to do anything illegal.

Candidate already knows the answers to these questions. She has no need to ask.

This "the boss made me do it" defense seems to be a recurring comment on HN, perhaps from those with a guilty conscience. But is it really persuasive. It is like asking the reader have empathy for a drug dealer selling fentanyl because "he really needs the money and no one else will hire him".

"I really just want to sell marijuana but the people higher up the chain decided we should sell opiates instead."

This is valid but potentially a slightly different issue, insofar as I don't believe that the work we're doing at Ethyca affects the moral compass of a company. There are certainly many businesses we actively choose not to work with and I won't start a war here but I think that's a vital point - trust and integrity are part of building anything that's safe.

Otherwise how do we know a bridge is structurally sound enough to walk across? We have to trust in the checks/balances, systems and people that work on that infrastructure. The same should apply to any software that affects large numbers of people.

If a company is setting out to do ill, that's a societal failing we should all care about and hope to prevent, but it won't be solved just with technical measures.

And yet devs can't entirely wash their hands of responsibility, since no one forces them to work at companies that don't respect privacy.
> no one forces them to work at companies that don't respect privacy.

I'll move if my customers move too.

And these Mexican standoffs are why we have governments.

If nobody wants to move first, we have to all vote to move at the same time.

https://slatestarcodex.com/2014/07/30/meditations-on-moloch/

> There’s a passage in the Principia Discordia where Malaclypse complains to the Goddess about the evils of human society. “Everyone is hurting each other, the planet is rampant with injustices, whole societies plunder groups of their own people, mothers imprison sons, children perish while brothers war.”

> The Goddess answers: “What is the matter with that, if it’s what you want to do?”

> Malaclypse: “But nobody wants it! Everybody hates it!”

> Goddess: “Oh. Well, then stop.”

Maybe for the "big" decisions (like whether or not to store user data in the first place), but I'd argue that in the myriad of small decisions (like whether to copy the entire user object into an analytics table or just a pseudonymized id) I do think the average dev isn't as "respectful" as you think...

YMMV of course

Fair - but I'm not sure we've made it technically easy on ourselves as developers to be "respectful". Right now it's expected that a dev might understand enough about the myriad of privacy laws that might affect their company's tech, the policies the company has and then design models that always enforce that safely. That seems a big ask - again, I agree with you that on the surface it might seem the average dev isn't as "respectful" as you'd think, but we should start by providing a set of tools/way of implementing privacy that make it feasible.
This is reminiscent of the early days of online shopping. When online shopping started to take off, the risk of having your credit card information stolen was higher than today. This was because credit cards were transmitted using the same means as other information at the time. That is, to say, insecurely. It wasn’t until later on that we began to see standards arise and libraries emerge that addressed the issue. This isn’t to say that the developers of the era didn’t care. They simply didn’t have the means to achieve what was needed. I think that we will see the same with data privacy in the future. Not every developer is a privacy expert and I don’t think that will ever be the case. Someone will come up with a set of tools accessible to every developer that will solve these issues in a standard way. I’ve seen it happen before and I’ll see it again. The old becomes new, history rhymes, etc etc.
Fair point TheChaplain - I'd actually argue it's across industry. It's a combination of a lack of knowledge about what privacy really is (a lot of confusion over that vs. security for example) and then how to ensure privacy for a user.

You're right that it's at every layer in any complex organization but I'm bullish on the belief that developers hold a lot of the solutions here and can make it happen even in a complex organizational hierarchy with competing interests. We first need better tools to make it easier to implement.

Where I work devs usually have the most force. We know privacy laws, we care about not collecting bullshit data that might ruin our day when our org looses it later. We are not a software company however, software is mostly for us and our customers
@atoav - this is awesome to hear! I'd love to know more about where you work and how you guys have done such a good job of helping devs understand and drive privacy. It doesn't really matter if the software's end-user is internal teams or customers, they're all people and privacy matters across the board.
It is everyone. For instance, in my role as DBA over 15 years ago we went ahead with a relational data model without much thought about 'how do I delete a user'. A few years later, I needed to write a script to scrub a user from the complex schema. Dozens of special cases 'how do we repair the data if we remove this row or alter this foreign key', cascading, and trawling the foreign key definitions in the database schema to write tests to automatically pick up when the schema was changed and fail integration tests if the script wasn't updated to cope... great fun writing that horror, but slow and should never have needed to exist with a bit of forethought. To this day removing an account is a slow, asynchronous process. And now with GDPR nobody should make that sort of mistake again since they should be aware.
Commendable work @Stubish on writing that script. You've nailed the exact problem that I'm referring to here.

Strong privacy is a kind of anachronistic issue in that the regulations came long after many systems were designed/built but also most of the common methods used to continue to design systems. So consistent data deletion across distributed system should be easy, but of course in truth as you know, it's a nightmare and often a brittle solution that needs to be updated as your software continues to change.

This specific example you've taken is a really good one of how painful privacy can be and how avoidable this issue should be for all devs, both software and data teams.

The only way to make it part of the original requirements and not an afterthought is to start making it like in other industries.

Exemplary punishment for any security exploit gone wild.

Management will start getting the required resources to make it happen accordingly.

There is a risk that good intentioned regulations become a barrier to entry that only large, well-resourced organizations can meet.

For example, what if "management" is a one or two person startup?

Maybe punishment is not the answer, but rather liability insurance coverage requirements. Or treat it like workers compensation where a small tax funds an insurance pool. And make it so repeat offenders get charged an increasingly higher tax rate.

The same issues as one or two person restaurant startup have to deal with, for example.

When kitchen cleanliness, plumbing, food quality and preservation, cutlery, access for disabled people, ... becomes an afterthought, it is time to be shutdown by consumer protection government agency, usually they get one time warning though.

Or maybe not, depending on the country, but then expect what might be great food with interesting side effects.

Definitely true that enforcement has to be proportional to both the infraction and the size of the organization.

However, I would argue that there are plenty of very small companies that also take advantage of that. i.e. very high growth, early stage companies with loads of vc backing that don't prioritize this because they're small and there's only two founders.

All I mean to say here is, again I'm not a regulator, however we do need a way to enforce against bad behavior.

In the 1950's no one wanted seatbelts, not car owners/public or auto manufacturers. Today no one would get in a car without seatbelts without thinking it was weird/crazy. Sometimes we have to enforce rules to drive change, otherwise bad behavior (particularly at large companies) goes unchecked.

Penalties can be both severe and proportionate to revenue at the same time, as one option.
Right b3morales - they can and should be proportionate to both size of company, revenue and the type of regulatory infraction - all of these must be evaluated but it doesn't mean we shouldn't enforce.

To take the restaurant example, a mom/pops restaurant may have less resource to bear for cleanliness and safety but if it consistently, knowingly persists in doing something that makes it's patrons ill - it is any less at fault than a chain of restaurants that does the same? The fine may be proportional to that organization - that's the goal with the GDPR's revenue % based fine format but it could/should go further for large companies that consistently fail.

There's an old saying: "politics is downstream from culture". Well, so is privacy.
That's an interesting one - I'm not sure I follow TeeMassive, do you mean to say that privacy is unimportant culturally?

If that's the case, we probably differ on POV a little so I'd love to know more about why you think this?

In my experience average people who don't work in tech (non devs) do not understand privacy or data use in a system - so it's hard for them to comprehend the potential impact. They're trusting people that work in tech (devs and others) to do the right thing and this is where the problem might be; we're being trusted to ensure we don't abuse or accidentally misuse a position of tremendous knowledge and power.

My parents certainly understand where/how their data is stored or used when they use their phone - aren't we then responsible for keeping those people who don't know safe?

We're both agreed on this point @pjmlp - regulations must have teeth and that often takes time. I can't speak to it, as it's not my area of expertise but we also have allowed a type of golem to form, in that big tech is now "too big to fail" and many governments struggle with balancing their ability to enforce against the level of employment that tech provides their country (look at Ireland for a prime example of how tough this balance is)

TLDR; I agree with you, but I'm not a politician and can't effect change there, so I'll keep chasing a realistic solution that makes it easier for devs to do of their own accord as we (dev community) are pretty good at solving things when we turn our mind to it.

I think a huge chunk of bad privacy outcomes arise through data retention policies and aggregation across many sources.

To operate, a phone company might need to know where you are calling from and who, a doctors office might need to know your medical and contact info, an isp might need to know your ip address, a dating website might need to know your ip address, your chat app might need to know your contact list, your gps might need to know your precise location at this specific time.

Do they need to know them for years though? And once all this info is aggregated, how personal is the information that can be learned?

Really good point fitblipper - I see this all the time.

Organizations have poor/no data retention policies so they accumulate information they no longer need for many years and on the other side they continue to build new data processing capabilities that might leverage that old data which lacks any context as to what it was gathered for or how it can (or cannot) be used.

On the point of what degree of identification can be learned (i.e. how personal is the information), I'm constantly reminding by folks in the ML field that you can discern a huge amount about an individual quite precisely from what might initially look like anonymized data - that's one of the exploits that most concern privacy specialists when they talk about differential privacy and pseudonymization - arguably we're not where we need to be yet but thankfully there are a number of teams working on solutions to this.

By this I mean the data analysis part; the retention and policy issue, that's still one most companies need to do better on.

This is so thoughtful to learn about and understand. Really enjoyed reading this.
The biggest issue IMO is that most software begins as a startup, and despite becoming big companies later, the startup mindset and habits don't really change.

Before product-market fit, you don't really know what data you need and what data you'll need. You can't go back in time to collect data, you need to do it now if there's a chance it will be useful later. In a pre-seed company (or even at seed), you don't have the resources to audit every package much less get SLAs in place. Most companies I know do the bare minimum for GDPR, and it's not a lack of care for the user, but that often it's not the best place to deploy resources when considering the survival of a company.

Most software starts out being assembled in flight with multiple changes in destination - and those habits stick.

Privacy and most other things are always an afterthought for closed software.
Agreed @swiley. I'm usually heartened when I remember that security wasn't dissimilar a few years ago and while it arguably still has a lot of challenges, security is part of a healthy dev mindset today. We've a way to go for privacy but I believe by building the right tools to make it easier for devs, this will happen.
This is extremely useful. Absolutely incredible
Every developer who as been in the industry longer than a year is almost certainly part of two or more major data breeches. I can only speak for myself but I take privacy and security very seriously not only for myself but in architecting system to protect client data as much as possible (and to collect as little as is absolutely necessary).
That is heartening to hear - your strong attitude to privacy I mean @mikece. Agree with you that if you've worked in software for long enough a breach of some kind is an inevitability and that's really important for every dev to understand.

I still meet so many who are laboring under the impression that it won't happen to them - it's just not the case so we all have to have a privacy and security first mindset.