| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hdhshdhshdjd 705 days ago
	Maybe in SOTA ml/nlp research, but in the world of building useful tools and products, BERT models are dead simple to tune, work great if you have decent training data, and most importantly are very very fast and very very cheap to run. I have a small Swiss army collection of custom BERT fine tunes that are equal or better than the best LLM and execute document classification tasks in 2.4ms. Find me an LLM that can do anything in 2.4ms.

5 comments

deepsquirrelnet 705 days ago

Latency, throughput and cost are still very important for many applications.

Also the output of a purpose-built encoder model is preferable to natural language. Not only is it unambiguous, but scores are often an important part of the result.

Last, if you need to get into some advanced methods of training, like pseudolabeling and semi-supervised learning, there’s different options and outlets for utilizing real world datasets.

That said, I’m not sure there’s much value in scaling up current encoder models. It seems like there’s already a point of diminishing returns.

link

hdhshdhshdjd 704 days ago

Scores are also interesting in that you can get 1.0 match on a classification task, but if the model is dog shit it’s 1.0 of dog shit.

I’m still struggling with the degree to which I want to expose raw scores to users for that reason.

On the other hand, sometimes a document that slightly above an arbitrary threshold isn’t great, or a document slightly below an arbitrary threshold may be fine.

I’m excited about how easy it is to do this stuff, as the tooling is sophisticated enough now, you don’t need to know too much about the underlying mechanisms to do things that are useful. Once you get into those very fine distinction, it’s still very difficult work.

link

Tostino 705 days ago

Want to share your collection with the class so we can all learn? Seems useful.

link

hdhshdhshdjd 705 days ago

Product in stealth for a little bit longer, so can’t say much. :-)

link

ipsum2 705 days ago

What does your swiss army collection do?

link

hdhshdhshdjd 705 days ago

Document classification in highly ambiguous contextual space. Solving some specific large scale classification tasks, so multi million document sets.

link

Seattle3503 705 days ago

What technique do you use to get BERT to work on longer documents?

link

hdhshdhshdjd 705 days ago

512 has been sufficient to solve my problems. I had done some initial attempts with BigBird that weren’t going well, but then realized I didn’t really need it.

I may revisit at some point.

link

llm_trw 705 days ago

Yeah, pretty much. When you have 2b files you need to troll through good luck using anything but a vector database. Once you do a level or two of pruning of the results then you can feed it into an LLM for final classification.

link