Hacker News new | ask | show | jobs
by rjknight 3980 days ago
This problem sneaks up on us, because in non-emotive contexts it's normally acceptable to say "what percentage of users will be affected by this?", and if it's below a certain percentage it may be the case that it's not worth the cost of fixing the problem. However, whilst it's acceptable to give a degraded service to people who use Internet Explorer 6, it being presumed that they have the option of using a more modern browser, it's not acceptable to do the same to people with non-binary gender identities (I mean, you can do it, and plenty of people do, but it will upset at least some people and you're not going to have any good answer for their complaints). What looks like an edge case from an engineering perspective looks like a fundamental part of their identity to the person who doesn't fit the available categories.

In these cases, I think there are two approaches that satisfy the requirements of simplicity and humanity: think very carefully about whether you need that field and, if not, just remove it; alternatively, make it a free input field. If you think that collecting as much data as possible is a good thing, then the free input field gives you the best possible scenario - the strictly most accurate, detailed data direct from the person who is in the best position to tell you. If you don't need it, then just don't ask.

1 comments

Of course, depending on why you're asking the question in the first place, a free input field can be a problem, because there can be many ways of expressing the same concept. So if you have a free-form input, you're going to need to figure out how to analyze that data to produce the groupings that you really want. But if you are capable of doing that, then absolutely, a free-form input field is the best way of avoiding unintentionally discriminating against anyone.

There's also a distinction here between questions that are objective vs subjective. Many would argue gender is objective, but they'd be wrong, it's subjective. You can't look at a person and tell them what their gender is; it's something only they can decide. So a free-form input field for Gender would be great, because everybody can feel fairly represented. But if you're asking for, say, Age, that's objective, and you can get away with letting the user choose from a set of ranges without any risk of discrimination (assuming you cover the full range of ages of all possible users).

So if you have a free-form input, you're going to need to figure out how to analyze that data to produce the groupings that you really want.

That suggests the root of the problem. Presumably the model is based on identifying a particular feature of the world as worth measuring [e.g. gender]. But boxing gender [e.g. masculine | feminine ] is not a feature of the world and the boxing means that the model does not correspond to the world in regard to gender, even though that was the purpose of capturing gender in the model. The idea of "getting the groupings I want" means my methods are suboptimal scientifically. The objective truths are in the data not in my interpretation.