Hacker News new | ask | show | jobs
by westoncb 2614 days ago
> Except that's exactly what it is.

Using the term 'bias' has certain political motivations behind it. It's not about the term being technically untrue as it is about the term being non-neutral. For instance, here are some definitions of 'bias' I just grabbed from American Heritage:

"A preference or an inclination, especially one that inhibits impartial judgment."

"An unfair act or policy stemming from prejudice."

"A statistical sampling or testing error caused by systematically favoring some outcomes over others."

The ML model does not have a preference, inclination, or prejudice relating to interns, except insofar as we anthropomorphize it to have them. What does using a word suggesting that add?

A more neutral account of what's going on is along the lines: It's easy to accidentally train ML models so that they will make systematic errors. (Among those errors is the possibility for it to exhibit behavior resembling prejudice.)

1 comments

Fine: it's easy to accidentally train ML models so that they will make systematic errors. Often these errors stem from systematic biases in our society, model creators should therefore be aware of the potential biases[1] that their models could reflect, and how to prevent them.

[1]: With the political motivation.

> Often these errors stem from systematic biases in our society ...

Depending on the what the appropriate quantification of 'often' is, that might make sense. Do we have enough reason to believe it would take on a high enough value to merit the usage of a term that refers only to it?

The other problem with what you're describing is that all we actually know is that the model is reflecting the current state of things. Your statement attributes particular causes to the current state of things, and implies a certain valuation of the current state of things (which I don't personally disagree with, necessarily—but I don't think my personal views should be reflected in scientific/engineering jargon).

So given the uncertain value of 'often,' and the unsettled nature of the causes behind various aspects of the 'current state of things,' it seems to be solidly jumping the gun to frame the entire general problem with a term that refers to this partial and fraught aspect of it.

>Your statement attributes particular causes to the current state of things

I didn't, nor should it matter how we got to where we are for a builder of a thing.

> and implies a certain valuation of the current state of things

This may have happened, but I'd disagree: recognizing that there exists inequality doesn't cast value judgement on that inequality. I simply stated that they're there. Perhaps saying "how to prevent them" is casting value judgement, so I might walk that back, model creators should be aware of the biases and aware of tools and strategies to account for them, if so desired.

Personally I think you're a bad person if, armed with the tools to detect and correct, you decide its okay to build something that has a systemic error that wrongly disfavors some group. But perhaps that's just me.

> ... recognizing that there exists inequality doesn't cast value judgement on that inequality.

You just asserted your attribution of cause right there: inequality. There are multiple possible causes for differing demographic representations in various roles. This is not a settled issue, even though people on both sides promote competing ideologies to the effect that it is.

(And again, I have intentionally left my own views on the subject out of this, even though I suspect they align with yours (insofar as cause attribution goes): I'm just pointing out the fact that this isn't something society agrees on, nor is it something the scientific data resolves unambiguously.)

> Personally I think you're a bad person if, armed with the tools to detect and correct, you decide its okay to build something that has a systemic error that wrongly disfavors some group.

Agreed, hinging on that point about cause attribution.

> Often these errors stem from systematic biases in our society

No, this does also not match.

One of the easiest way to get a ML model that creates systematic errors is spam filters. If I take my spam folder with no consideration, what the filter will learn is that any language which isn't my own are spam, and that servers located outside my nation are spammers. This resembles prejudice.

The cause of this systematic error is that individual email addresses do not get ham emails uniformly from every nation and every language. Proximity warps the data. I would need to normalize the data based on language and nation if I wanted to remove those errors in the filter. Looking at it from a political perspective does not make the filter perform better, and fixing it from that side has a high risk of causing even more errors in the model.