|
|
|
|
|
by haldujai
1252 days ago
|
|
Thanks for sharing, I'll give it a read. Perhaps I wasn't clear. I fully believe in transformers and foundation models, my criticism is more on creeping model size and whether trying to use huge models is even the right approach for someone seeking to deploy a transformer at scale. Conveniently, I'm decently familiar with Dr. Liang's work as he has done some really great stuff in the biomedical domain recently. Using his publications as an example and considering my domain (medical), isn't he showing that smaller models with different architectures (such as DRAGON) or pretrained on in-domain text (such as PubMedGPT) are effective ways to get increased performance rather than just simply scaling a more general LLM to unusable sizes? My experience thus far has been that fine-tuning BERT-large sized models works really well, can be deployed on hospital infrastructure for inference (granted the workstations in the radiology department all have decent GPUs) and doesn't need much in terms of inference optimization. Perhaps I'm missing something here, appreciate your input. |
|
Smaller models are a stop-gap solution because they are task-specific and can incorporate expert knowledge. The thrust of ML research over the past decade has been consolidation of effort and huge-scale training to replace expert knowledge (or using expert knowledge as micro tasks to condition the huge-scale training). I bet a dollar to a dime that in several years, that these smaller models will be replaced by foundation models that are fine-tuned and possibly distilled, as the field does the following:
* Build foundation models.
* Discover weaknesses and blind-spots.
* Patch them either using more data or micro-tasks.
* Iterate.