|
|
|
|
|
by webmaven
1919 days ago
|
|
> I recently created a natural language generation model(built with LSTM layers mostly) that was trained on east-asian zen books. Do you think that my result could have been better if I would've used an architecture not designed by white germans? Potentially. Ways this might happen: most SOTA architectures are dependent on larger datasets and abundant compute resources for training (as well as related tasks such as hyperparameter optimization or architecture search). Few architectures are designed or evaluated based on smaller (fixed) corpora sizes and smaller (fixed) training budgets. Even few-shot learning tasks typically still require a huge amount of pre-training on large datasets. So researchers and practitioners constrained by fewer resources and smaller datasets (which may not apply to you specifically) trying to adapt popular architectures to their needs are disadvantaged. Compare the attention being given to energy budgets and similar constraints for inference as opposed to training and the disparity becomes fairly obvious. So, yes, adapting an architecture created in the first place with the kinds of constraints you are likely to face in mind, by folks that are more likely to be facing similar constraints themselves, may very well lead to you achieving as good or better results with similar or less effort and expense. |
|
>Few architectures are designed or evaluated based on smaller (fixed) corpora sizes and smaller (fixed) training budgets. Even few-shot learning tasks typically still require a huge amount of pre-training on large datasets. So researchers and practitioners constrained by fewer resources and smaller datasets (which may not apply to you specifically) trying to adapt popular architectures to their needs are disadvantaged. Compare the attention being given to energy budgets and similar constraints for inference as opposed to training and the disparity becomes fairly obvious.
that is an interesting point, and i feel like it generalizes to the fact that using more efficient architecture that was perhaps was designed by someone with a lesser training budget. Although I must say that from my limited DL paper reading, efficient small-scale novel architecture doesn't necessarily comes from cash-strapped researchers, as a more efficient(energy and time) would be of huge economic value also to companies like OpenAI, who have spent huge amounts on training GPTs.
[1] - https://www.bioinf.jku.at/publications/older/2604.pdf