Hacker News new | ask | show | jobs
by FlyingSaucer 1917 days ago
I think i agree with your general sentiment, but specifically LSTMs were first proposed in a 1997 paper[1], thus my guess is that they didn't have vast resources compared to today. I mean, i trained my model on a GPU for a few days, which im guessing is actually more resources than they had.

>Few architectures are designed or evaluated based on smaller (fixed) corpora sizes and smaller (fixed) training budgets. Even few-shot learning tasks typically still require a huge amount of pre-training on large datasets. So researchers and practitioners constrained by fewer resources and smaller datasets (which may not apply to you specifically) trying to adapt popular architectures to their needs are disadvantaged. Compare the attention being given to energy budgets and similar constraints for inference as opposed to training and the disparity becomes fairly obvious.

that is an interesting point, and i feel like it generalizes to the fact that using more efficient architecture that was perhaps was designed by someone with a lesser training budget. Although I must say that from my limited DL paper reading, efficient small-scale novel architecture doesn't necessarily comes from cash-strapped researchers, as a more efficient(energy and time) would be of huge economic value also to companies like OpenAI, who have spent huge amounts on training GPTs.

[1] - https://www.bioinf.jku.at/publications/older/2604.pdf

1 comments

I guess it is all about which questions you are trying to answer. Generally speaking at the OpenAI and Deepmind end of the scale the question is "(How) can we escape the diminishing returns of applying ever more data and compute to improve upon the SOTA?" rather than "(How) can we improve upon the SOTA without applying more data and compute?".