| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by spmurrayzzz 546 days ago

Historically the problem with using LLMs for the super simple conventional NLP stuff is that they were hard to control in terms of output. If you wanted a one-word answer for a classification task, you'd often have to deal with it responding in a paragraph. This obviously hurts precision and accuracy quite a bit. There were tricks you could use (like using few-shot examples or GBNF grammars or training low-rank adapters or even re-asking the model) to constrain output a bit, but they weren't perfect.

Over the last 12-18 months though, the instruction-following capabilities of the models have improved substantially. This new mistral model in particular is fantastic at doing what you ask.

My approach to this personally and professionally is to just benchmark. If I have a classification task, I use a tiny model first, eval both, and see how much improvement I'd get using an LLM. Generally speaking though, the vram costs are so high for the latter that its often not worth it. It really is a case-by-case decision though. Sometimes you want one generic model to do a bunch of tasks rather than train/finetune a dozen small models that you manage in production instead.