Hacker News new | ask | show | jobs
by csande17 1794 days ago
I dunno, machine learning engineers were a key force against privacy at one company I once worked at. As an ML engineer, your whole career sort of depends on convincing the company you work at to collect as much information as possible from their users, so that you can then run it through magical algorithms to make it look like you're providing value to the company. (There are exceptions, but entire subfields like "recommendation systems" depend on this--startups aren't hiring ML engineers to get better at playing chess.)
2 comments

I've also seen plenty of projects which did not use proper authentication and authorization because devs were too lazy to implement it or to learn the platform they were developing on (e.g. AWS). It's an indirect effect on privacy, but when it hits it hits hard.
Fair @isbvhodnvemrwvn - I'd say it's a combination of laziness, lack of knowledge perhaps or both?

I'm conscious that it's easy to say "do _____ better" and insert a soap box like privacy or security; but if you don't provide developers with tools, whether that's libraries, IDE plugins, linters or code analysis tools to make that task easier, it's almost impossible - can't ask developers to be privacy experts, but we can help them to make that expertise readily available.

Like the example you took - provisioning cloud infrastructure is a heck of a lot easier now that it was a decade ago because of a bunch of orchestration and infra as code tools that ease the burden/knowledge gap for developers. We've got to have the same for anything we care about - in my case, that's privacy.

Really good point csande17 but I'd be nervous to tar all ML folk with the same brush :) I know some incredible data science and ML folks who care deeply about privacy - check out http://openmined.org to see hope in the ML community. There are large numbers of talented engineers who are dedicated to building better data processing systems.

If I can offer my two cents: if we do better at annotating data at source of ingress (whether that's data provided by users, inferred about them, etc.) such that we can better describe the data we hold, why we have that dataset and for what limited uses, we can then enforce those conditions on models - right now that additional context just doesn't exist so privacy type enforcement on ML becomes arbitrary and subjective based on a teams needs. We can do so much better if just describe what, where, how and why we've collected data in our systems - then enforcement is layered on top of that.