| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by FlyingSaucer 1919 days ago

> Fixing racial bias in AI is not just a matter of infusing the training data with more melanin (for example), the AI Ethics crowd argues — the actual models are being developed by white guys, and their insular, white-guy priorities somehow surface as bias in the algorithms that go to work on the training data.

I recently created a natural language generation model(built with LSTM layers mostly) that was trained on east-asian zen books. Do you think that my result could have been better if I would've used an architecture not designed by white germans?

This idea seems is to me like anthropomorphizing model architecture for no real reason.

I do feel like there might be issues of ingrained bias in the model itself when using trained NLP embeddings or even some facial feature recognition algorithms that were tested on racially homogeneous groups.

1 comments

webmaven 1918 days ago

> I recently created a natural language generation model(built with LSTM layers mostly) that was trained on east-asian zen books. Do you think that my result could have been better if I would've used an architecture not designed by white germans?

Potentially. Ways this might happen: most SOTA architectures are dependent on larger datasets and abundant compute resources for training (as well as related tasks such as hyperparameter optimization or architecture search). Few architectures are designed or evaluated based on smaller (fixed) corpora sizes and smaller (fixed) training budgets. Even few-shot learning tasks typically still require a huge amount of pre-training on large datasets. So researchers and practitioners constrained by fewer resources and smaller datasets (which may not apply to you specifically) trying to adapt popular architectures to their needs are disadvantaged. Compare the attention being given to energy budgets and similar constraints for inference as opposed to training and the disparity becomes fairly obvious.

So, yes, adapting an architecture created in the first place with the kinds of constraints you are likely to face in mind, by folks that are more likely to be facing similar constraints themselves, may very well lead to you achieving as good or better results with similar or less effort and expense.

link

FlyingSaucer 1918 days ago

I think i agree with your general sentiment, but specifically LSTMs were first proposed in a 1997 paper[1], thus my guess is that they didn't have vast resources compared to today. I mean, i trained my model on a GPU for a few days, which im guessing is actually more resources than they had.

>Few architectures are designed or evaluated based on smaller (fixed) corpora sizes and smaller (fixed) training budgets. Even few-shot learning tasks typically still require a huge amount of pre-training on large datasets. So researchers and practitioners constrained by fewer resources and smaller datasets (which may not apply to you specifically) trying to adapt popular architectures to their needs are disadvantaged. Compare the attention being given to energy budgets and similar constraints for inference as opposed to training and the disparity becomes fairly obvious.

that is an interesting point, and i feel like it generalizes to the fact that using more efficient architecture that was perhaps was designed by someone with a lesser training budget. Although I must say that from my limited DL paper reading, efficient small-scale novel architecture doesn't necessarily comes from cash-strapped researchers, as a more efficient(energy and time) would be of huge economic value also to companies like OpenAI, who have spent huge amounts on training GPTs.

[1] - https://www.bioinf.jku.at/publications/older/2604.pdf

link

webmaven 1917 days ago

I guess it is all about which questions you are trying to answer. Generally speaking at the OpenAI and Deepmind end of the scale the question is "(How) can we escape the diminishing returns of applying ever more data and compute to improve upon the SOTA?" rather than "(How) can we improve upon the SOTA without applying more data and compute?".

link