| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by yummyfajitas 3570 days ago

Yes, if your training data excludes relevant features then you can't use them. No one disputes this.

However, once you start including such people in your training data, these issues are not hard to prevent. In fact, ML systems will often do this accidentally even when you don't want them to (when the sign of the bias has the politically incorrect direction). It's called redundant encoding.

See the section of my blog post "What if we scrub race, but redundantly encode it?" where I do calculations to show the effect of this: https://www.chrisstucchio.com/blog/2016/alien_intelligences_...

In short, if your data is biased against a group, but you include group membership either directly or via redundant encoding, your algorithm will fix the bias as best it can.

The entire purpose of machine learning is to discover hidden features and correlations in messy data, so I fail to see why this is considered surprising.

I generally consider the "right to explanation" to be a fairly transparent attempt by the EU to keep American tech companies out of Europe. The entire purpose of ML is that it can uncover true facts that humans can't. The right to explanation is just an attempt to hobble this power, probably because few Euro companies can do it.