I dunno, machine learning engineers were a key force against privacy at one company I once worked at. As an ML engineer, your whole career sort of depends on convincing the company you work at to collect as much information as possible from their users, so that you can then run it through magical algorithms to make it look like you're providing value to the company. (There are exceptions, but entire subfields like "recommendation systems" depend on this--startups aren't hiring ML engineers to get better at playing chess.)
I've also seen plenty of projects which did not use proper authentication and authorization because devs were too lazy to implement it or to learn the platform they were developing on (e.g. AWS). It's an indirect effect on privacy, but when it hits it hits hard.
Fair @isbvhodnvemrwvn - I'd say it's a combination of laziness, lack of knowledge perhaps or both?
I'm conscious that it's easy to say "do _____ better" and insert a soap box like privacy or security; but if you don't provide developers with tools, whether that's libraries, IDE plugins, linters or code analysis tools to make that task easier, it's almost impossible - can't ask developers to be privacy experts, but we can help them to make that expertise readily available.
Like the example you took - provisioning cloud infrastructure is a heck of a lot easier now that it was a decade ago because of a bunch of orchestration and infra as code tools that ease the burden/knowledge gap for developers. We've got to have the same for anything we care about - in my case, that's privacy.
Really good point csande17 but I'd be nervous to tar all ML folk with the same brush :) I know some incredible data science and ML folks who care deeply about privacy - check out http://openmined.org to see hope in the ML community. There are large numbers of talented engineers who are dedicated to building better data processing systems.
If I can offer my two cents: if we do better at annotating data at source of ingress (whether that's data provided by users, inferred about them, etc.) such that we can better describe the data we hold, why we have that dataset and for what limited uses, we can then enforce those conditions on models - right now that additional context just doesn't exist so privacy type enforcement on ML becomes arbitrary and subjective based on a teams needs.
We can do so much better if just describe what, where, how and why we've collected data in our systems - then enforcement is layered on top of that.
Candidate: Will this job require me to violate anyone's privacy.
Hold on, how about this instead.
Candidate: Will this job require me to do anything illegal.
Candidate already knows the answers to these questions. She has no need to ask.
This "the boss made me do it" defense seems to be a recurring comment on HN, perhaps from those with a guilty conscience. But is it really persuasive. It is like asking the reader have empathy for a drug dealer selling fentanyl because "he really needs the money and no one else will hire him".
"I really just want to sell marijuana but the people higher up the chain decided we should sell opiates instead."
This is valid but potentially a slightly different issue, insofar as I don't believe that the work we're doing at Ethyca affects the moral compass of a company.
There are certainly many businesses we actively choose not to work with and I won't start a war here but I think that's a vital point - trust and integrity are part of building anything that's safe.
Otherwise how do we know a bridge is structurally sound enough to walk across? We have to trust in the checks/balances, systems and people that work on that infrastructure. The same should apply to any software that affects large numbers of people.
If a company is setting out to do ill, that's a societal failing we should all care about and hope to prevent, but it won't be solved just with technical measures.
> There’s a passage in the Principia Discordia where Malaclypse complains to the Goddess about the evils of human society. “Everyone is hurting each other, the planet is rampant with injustices, whole societies plunder groups of their own people, mothers imprison sons, children perish while brothers war.”
> The Goddess answers: “What is the matter with that, if it’s what you want to do?”
Maybe for the "big" decisions (like whether or not to store user data in the first place), but I'd argue that in the myriad of small decisions (like whether to copy the entire user object into an analytics table or just a pseudonymized id) I do think the average dev isn't as "respectful" as you think...
Fair - but I'm not sure we've made it technically easy on ourselves as developers to be "respectful". Right now it's expected that a dev might understand enough about the myriad of privacy laws that might affect their company's tech, the policies the company has and then design models that always enforce that safely.
That seems a big ask - again, I agree with you that on the surface it might seem the average dev isn't as "respectful" as you'd think, but we should start by providing a set of tools/way of implementing privacy that make it feasible.
This is reminiscent of the early days of online shopping. When online shopping started to take off, the risk of having your credit card information stolen was higher than today. This was because credit cards were transmitted using the same means as other information at the time. That is, to say, insecurely. It wasn’t until later on that we began to see standards arise and libraries emerge that addressed the issue. This isn’t to say that the developers of the era didn’t care. They simply didn’t have the means to achieve what was needed. I think that we will see the same with data privacy in the future. Not every developer is a privacy expert and I don’t think that will ever be the case. Someone will come up with a set of tools accessible to every developer that will solve these issues in a standard way. I’ve seen it happen before and I’ll see it again. The old becomes new, history rhymes, etc etc.
Fair point TheChaplain - I'd actually argue it's across industry. It's a combination of a lack of knowledge about what privacy really is (a lot of confusion over that vs. security for example) and then how to ensure privacy for a user.
You're right that it's at every layer in any complex organization but I'm bullish on the belief that developers hold a lot of the solutions here and can make it happen even in a complex organizational hierarchy with competing interests.
We first need better tools to make it easier to implement.
Where I work devs usually have the most force. We know privacy laws, we care about not collecting bullshit data that might ruin our day when our org looses it later. We are not a software company however, software is mostly for us and our customers
@atoav - this is awesome to hear! I'd love to know more about where you work and how you guys have done such a good job of helping devs understand and drive privacy.
It doesn't really matter if the software's end-user is internal teams or customers, they're all people and privacy matters across the board.
It is everyone. For instance, in my role as DBA over 15 years ago we went ahead with a relational data model without much thought about 'how do I delete a user'. A few years later, I needed to write a script to scrub a user from the complex schema. Dozens of special cases 'how do we repair the data if we remove this row or alter this foreign key', cascading, and trawling the foreign key definitions in the database schema to write tests to automatically pick up when the schema was changed and fail integration tests if the script wasn't updated to cope... great fun writing that horror, but slow and should never have needed to exist with a bit of forethought. To this day removing an account is a slow, asynchronous process. And now with GDPR nobody should make that sort of mistake again since they should be aware.
Commendable work @Stubish on writing that script. You've nailed the exact problem that I'm referring to here.
Strong privacy is a kind of anachronistic issue in that the regulations came long after many systems were designed/built but also most of the common methods used to continue to design systems. So consistent data deletion across distributed system should be easy, but of course in truth as you know, it's a nightmare and often a brittle solution that needs to be updated as your software continues to change.
This specific example you've taken is a really good one of how painful privacy can be and how avoidable this issue should be for all devs, both software and data teams.