|
|
|
|
|
by hdhshdhshdjd
705 days ago
|
|
Maybe in SOTA ml/nlp research, but in the world of building useful tools and products, BERT models are dead simple to tune, work great if you have decent training data, and most importantly are very very fast and very very cheap to run. I have a small Swiss army collection of custom BERT fine tunes that are equal or better than the best LLM and execute document classification tasks in 2.4ms. Find me an LLM that can do anything in 2.4ms. |
|
Also the output of a purpose-built encoder model is preferable to natural language. Not only is it unambiguous, but scores are often an important part of the result.
Last, if you need to get into some advanced methods of training, like pseudolabeling and semi-supervised learning, there’s different options and outlets for utilizing real world datasets.
That said, I’m not sure there’s much value in scaling up current encoder models. It seems like there’s already a point of diminishing returns.