Hacker News new | ask | show | jobs
by webmaven 1919 days ago
> I recently created a natural language generation model(built with LSTM layers mostly) that was trained on east-asian zen books. Do you think that my result could have been better if I would've used an architecture not designed by white germans?

Potentially. Ways this might happen: most SOTA architectures are dependent on larger datasets and abundant compute resources for training (as well as related tasks such as hyperparameter optimization or architecture search). Few architectures are designed or evaluated based on smaller (fixed) corpora sizes and smaller (fixed) training budgets. Even few-shot learning tasks typically still require a huge amount of pre-training on large datasets. So researchers and practitioners constrained by fewer resources and smaller datasets (which may not apply to you specifically) trying to adapt popular architectures to their needs are disadvantaged. Compare the attention being given to energy budgets and similar constraints for inference as opposed to training and the disparity becomes fairly obvious.

So, yes, adapting an architecture created in the first place with the kinds of constraints you are likely to face in mind, by folks that are more likely to be facing similar constraints themselves, may very well lead to you achieving as good or better results with similar or less effort and expense.

1 comments

I think i agree with your general sentiment, but specifically LSTMs were first proposed in a 1997 paper[1], thus my guess is that they didn't have vast resources compared to today. I mean, i trained my model on a GPU for a few days, which im guessing is actually more resources than they had.

>Few architectures are designed or evaluated based on smaller (fixed) corpora sizes and smaller (fixed) training budgets. Even few-shot learning tasks typically still require a huge amount of pre-training on large datasets. So researchers and practitioners constrained by fewer resources and smaller datasets (which may not apply to you specifically) trying to adapt popular architectures to their needs are disadvantaged. Compare the attention being given to energy budgets and similar constraints for inference as opposed to training and the disparity becomes fairly obvious.

that is an interesting point, and i feel like it generalizes to the fact that using more efficient architecture that was perhaps was designed by someone with a lesser training budget. Although I must say that from my limited DL paper reading, efficient small-scale novel architecture doesn't necessarily comes from cash-strapped researchers, as a more efficient(energy and time) would be of huge economic value also to companies like OpenAI, who have spent huge amounts on training GPTs.

[1] - https://www.bioinf.jku.at/publications/older/2604.pdf

I guess it is all about which questions you are trying to answer. Generally speaking at the OpenAI and Deepmind end of the scale the question is "(How) can we escape the diminishing returns of applying ever more data and compute to improve upon the SOTA?" rather than "(How) can we improve upon the SOTA without applying more data and compute?".