| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by koutetsu 810 days ago

Let me quote from the article:

> Lavender learns to identify characteristics of known Hamas and PIJ operatives, whose information was fed to the machine as training data, and then to locate these same characteristics — also called “features” — among the general population, the sources explained. An individual found to have several different incriminating features will reach a high rating, and thus automatically becomes a potential target for assassination.

It literally says that they use data from known Hamas members (we don't know what this data contains) as training data which is a recipe for making biased predictions. Hamas members represent a minority in Gaza (the total population is over 2 million people) and thus the real data is heavily imbalanced[0] and unless addressed leads to bad models.

On top of that, if you know anything about Machine Learning then you should be aware of models finding spurious correlations[1] in the data that make its predictions accurate on the available training and validation data and not so much once deployed and used with real data.

[0] https://developers.google.com/machine-learning/data-prep/con...

[1] https://thegradient.pub/shortcuts-neural-networks-love-to-ch...

1 comments

onethought 809 days ago

Thank you for repeating what I said. If these features are: “carrying weapon” or “visiting known Hamas military site” - then the risk of unintended bias is lower.

If the features are things like “wears a scarf” or “has a beard” then I agree unintended bias is likely a problem. But given we don’t know. How can we comment?

koutetsu 809 days ago

Looking at this from a machine learning perspective, the risk of biases is even higher in these cases because of issues with data drift (Members could change sites, they could start dressing differently, etc.) and imbalances in the dataset (A lot fewer Hamams members than civilians in Gaza).

Additionally, juging from the amount of data such models would have to go through in order to make predictions (social media, camera footage, etc.) I would assume that they are using neural networks. This type of model performs best without raw unprocessed data e.g. raw camera footage instead of preprocessed features like "wears a scarf" or "carrying a weapon". They are also well known to be black boxes whoe mredictions cannot really be explained [0].

We can still comment on this topics based an assumptions and previous experince. I don't have experience working in the military field but I have experience working in the AI field and these are strong assumptions I am making.

[0] https://arxiv.org/abs/1811.10154