| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mcotton 1623 days ago

This could potentially have very interesting consequences for ML models. It will depend if their training data is considered part of the source code.

If there was a loss prevention specific video analytic that flagged a person’s behavior as abnormal, would the person have a right to audit the source code for the CNN and/or the training data that was used in the development of that analytic?

As someone working on such analytics it could become a real adventure to comply with that. My dataset came from customers that agreed to shared with positive/negative examples with me but not necessarily for me to share publicly. The privacy of the people in the shared examples would also need to be considered.

6 comments

noizejoy 1623 days ago

The argument could be made that private ML training datasets with public consequences are a fundamental problem.

link

karpierz 1623 days ago

Wouldn't that just mean you'd have to make sure the data doesn't contain private identifying information to start with? Seems like a win-win for the people in datasets and the people who are auditing the code.

link

hutzlibu 1623 days ago

"Wouldn't that just mean you'd have to make sure the data doesn't contain private identifying information to start with"

That condition is not easy.

It is very hard, to have data about people related stuff, without private identifying information - especially because now there is face recognition and co.

link

Nullabillity 1623 days ago

Then ML simply isn't a viable solution to that problem?

link

int_19h 1623 days ago

If the person does not have the right to audit the model - i.e. determine how and why exactly it flagged them - with consequences wrt government interaction with that purpose, I would argue that it's a violation of due process. If it's impossible to meet that standard with ML in a satisfactory way, then perhaps ML should not be used in those contexts at all?

link

pabs3 1623 days ago

I found the Debian Deep Learning Team's Machine Learning policy (which covers this kind of stuff) very interesting:

https://salsa.debian.org/deeplearning-team/ml-policy

link

Kinrany 1623 days ago

> It will depend if their training data is considered part of the source code.

It has to be, otherwise you could argue that javascript is data dynamically loaded by an open-source browser.

link

Zigurd 1623 days ago

Why should they not? Compliance is hard even for very mundane things.

link