|
|
|
|
|
by aszen
2173 days ago
|
|
Very interesting links, thanks for sharing. So the trend isn't changing we still need bigger models to make progress in NLP and CV, while the algorithmic effeciencies are promising but they aren't giving anywhere near the same improvements as larger models. I'm curious how long this trend will continue and if there's anything promising that can reverse this trend |
|
As long as our proof of concept solutions don't yet solve the task appropriately, as long as the solution is weak and/or brittle and worse than what we need for the main partical applications, most of the research focus - and the research progress - will be on models that try and give better results. It makes sense to disregarding the compute cost and other impractical inconveniences when working on pushing the bleeding edge, trying to make the previously impossible things possible
However, when tasks are "solved" from the academic proof-of-concept perspective, then generally the practical, applied work on model efficiency can get huge reductions in computing power required. But that happens elsewhere.
The concept of technology readiness level (https://en.wikipedia.org/wiki/Technology_readiness_level) is relevant. For the NLP and CV technologies that are in TRL 3 or 4, the efficiency does not really matter as long as it fits in whatever computing clusters you can afford; this is mainly an issue for the widespread adoption of some tech in industry by the time the same tech is in TRL 6 or so, and this work mostly gets done by different people in different organizations with different funding sources than the initial TRL 3 research.