| HN Mirror

Not generally. The entire point being made is that whether one feature is deemed to be more valuable than another feature depends not just on the data fed into the system but also on the training method used.

Specifically, the gp is pointing out that typical approaches will not pay attention to a feature that doesn't have many data points associated with it. In other words, if it hasn't seen very much of something then it won't "form an opinion" about it and thus the other features will be the ones determining the output value.

Additionally, the gp also points out that if you were to accidentally do something (say, feed in non-tech resumes) that exposed your model to an otherwise missing feature (say, predominantly female hobbies or women's colleges or whatever) in a negative light, then you will have (inadvertently) directly trained your model to treat those features as negatives.

Of course, another (hacky) hypothetical (noted elsewhere in this thread) would be to use "resume + hire/pass" as your data set. In that case, your model would simply try to emulate your current hiring practices. If your current practices exhibit a notable bias towards a given feature, then your model presumably will too.