Hacker News new | ask | show | jobs
by mcotton 1623 days ago
This could potentially have very interesting consequences for ML models. It will depend if their training data is considered part of the source code.

If there was a loss prevention specific video analytic that flagged a person’s behavior as abnormal, would the person have a right to audit the source code for the CNN and/or the training data that was used in the development of that analytic?

As someone working on such analytics it could become a real adventure to comply with that. My dataset came from customers that agreed to shared with positive/negative examples with me but not necessarily for me to share publicly. The privacy of the people in the shared examples would also need to be considered.

6 comments

The argument could be made that private ML training datasets with public consequences are a fundamental problem.
Wouldn't that just mean you'd have to make sure the data doesn't contain private identifying information to start with? Seems like a win-win for the people in datasets and the people who are auditing the code.
"Wouldn't that just mean you'd have to make sure the data doesn't contain private identifying information to start with"

That condition is not easy.

It is very hard, to have data about people related stuff, without private identifying information - especially because now there is face recognition and co.

Then ML simply isn't a viable solution to that problem?
If the person does not have the right to audit the model - i.e. determine how and why exactly it flagged them - with consequences wrt government interaction with that purpose, I would argue that it's a violation of due process. If it's impossible to meet that standard with ML in a satisfactory way, then perhaps ML should not be used in those contexts at all?
I found the Debian Deep Learning Team's Machine Learning policy (which covers this kind of stuff) very interesting:

https://salsa.debian.org/deeplearning-team/ml-policy

> It will depend if their training data is considered part of the source code.

It has to be, otherwise you could argue that javascript is data dynamically loaded by an open-source browser.

Why should they not? Compliance is hard even for very mundane things.