Hacker News new | ask | show | jobs
by bravura 1252 days ago
New AI tasks are being unlocked by (large-scale) foundation models (Liang, 2022).

Fine-tuning in low-resource (few-shot) scenarios is now possible for many new applications.

However, these new AI applications relied upon a huge pretrained model to get there. Because the old approach of training from scratch on 100 labeled examples didn't work well.

Thus, we want to distill the knowledge so that the model can be deployed in low-resource scenarios.

[edit: I see your below comment about the concern about transformer cost. Agreed. This is one of the many concerns around foundation models that must be understood. The happy path is that training the foundation model is a one-time cost that pays dividends in the many tasks it unlocks. However, you are correct that the research to get there is quite spendy. I encourage you to skim this paper. It's long but very accessible: https://arxiv.org/pdf/2108.07258.pdf

ps the reason commercial use cases all use simple models is not because simple models are ipso facto commercially valuable. It's just that industry practitioners are too overworked to do fancy bleeding edge stuff. Thus, the dirty secrets is that most fancy ML companies are just using logistic regression for everything. Foundation models allow industry practitioners rapidly to train powerful accurate models. The question is, now, how do they deploy them.]

1 comments

Thanks for sharing, I'll give it a read.

Perhaps I wasn't clear. I fully believe in transformers and foundation models, my criticism is more on creeping model size and whether trying to use huge models is even the right approach for someone seeking to deploy a transformer at scale.

Conveniently, I'm decently familiar with Dr. Liang's work as he has done some really great stuff in the biomedical domain recently. Using his publications as an example and considering my domain (medical), isn't he showing that smaller models with different architectures (such as DRAGON) or pretrained on in-domain text (such as PubMedGPT) are effective ways to get increased performance rather than just simply scaling a more general LLM to unusable sizes?

My experience thus far has been that fine-tuning BERT-large sized models works really well, can be deployed on hospital infrastructure for inference (granted the workstations in the radiology department all have decent GPUs) and doesn't need much in terms of inference optimization.

Perhaps I'm missing something here, appreciate your input.

I appreciate your openness here. Based upon my background, I'll do a little bit of handwaving, so we can read the tea leaves and see where the puck is going, while not overly mixing metaphors.

Smaller models are a stop-gap solution because they are task-specific and can incorporate expert knowledge. The thrust of ML research over the past decade has been consolidation of effort and huge-scale training to replace expert knowledge (or using expert knowledge as micro tasks to condition the huge-scale training). I bet a dollar to a dime that in several years, that these smaller models will be replaced by foundation models that are fine-tuned and possibly distilled, as the field does the following:

* Build foundation models.

* Discover weaknesses and blind-spots.

* Patch them either using more data or micro-tasks.

* Iterate.