Hacker News new | ask | show | jobs
by zwaps 1553 days ago
If my model is used to profile a given user such as to maximize revenue from them (my objective is generally increasing with a more accurate classification of a user to the degree that such categories are revenue relevant), does this model still work?

If so, how is it privacy compliant, i.e. suffice the intent of the law in say, EU countries, or will not be identified as "privacy theater" in the US? If not, what do you do in these cases?

Cool to get your take on this.

2 comments

If the model training is designed to profile just one user, no, the model won't work by design. What you describe is an attack on the privacy of that user and we do want to make sure they fail.

The way differential privacy works with machine learning is that it guarantees that one given record cannot have a significant impact on the weights of the models and therefore on its performance. In the particular case of SGD-based models, the guarantee holds for every step of the descent. A good place to start on the topic is Abadi 2016 (https://arxiv.org/pdf/1607.00133.pdf).

What is important in the approach is that we don't need to detect that there is something funny in the loss function of the model. Sarus uses the exact same approach whether the model or the loss function is malevolent or not. The guarantees still hold. This is important because a lot of models can extract personal information even with no intention of doing so and no real way to detect it.

A good way to think about model performance is that we are looking for models that perform well irrespective of one record. If there are many users that have the same pattern of the user you are trying to spy on, the model may still be good but you won't know whether it's because of that user or not.

Sarus would typically fit in organizations that legitimately collect personal data and can take decisions based on these data. In these cases you don't want (and most of the time cannot legally) let anyone in the organisation have access to the full personal data records. Using Sarus anyone, even untrusted parties can run analysis on your data safely. These analysis can be classification-model-fitting. You can then classify accurately users to maximize your revenue as long as you can observe the values to feed into your classifier.