| It's the methods being used to select, populate, label, and validate the training set that are the problem. Basically, your team is composed of white dudes who don't see the problem with a ML training set consisting largely of pictures of white dudes. To prevent this they'd have needed to A) Employ a black person, and B) Listen to said employee's feedback, in order to recognize the problem. Edit: Also worth pointing out, just using a representative population sampling would still show racial bias, essentially weighting accuracy with respect to population percent. You'd probably need to have equal samplings of pictures of people from all races/genders/disabilities if you wanted equal accuracy across the board. That also includes picture quality and range of picture quality. Doubling up images, or using corporate headshot white dudes and grainy selfie People of Color could still cause issues. Same logic applies to labelling. That minimum wage contracting firm used to decide who's who in the photos may exhibit racial bias, by virtue of the fact that most people do. If their accuracy in labelling is racially biased then so too will the algorithms that it's based on. In short: Racist garbage in, racist garbage out. |