Text classifiers are an underrated application of LLMs | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Text classifiers are an underrated application of LLMs (blog.kasperjunge.com)
	94 points by juunge 1009 days ago

16 comments

thewataccount 1009 days ago

From my experience "single prompt classification" isn't as simple as "type in sentence and it works" in practice. But you can use some methods to massively improve it's consistency/output.

I cannot recommend guidance enough. You can use shockingly small Llama models for some tasks with guidance while only actually generating a handful of tokens.

You should highly consider some form of guidance/logit bias for classification especially if you have a known set of classes. This will ensure you get it in the format that you want, with the correct classes that you want.

Keep in mind LLMs perform much better with COT. So you make it explain what the text/image is, then explain the possible classifications, then list its final decision. Again guidance can ensure it follows the correct format to do this.

LLM's still massively benefit from finetuning, especially if you want too classify it in a particular format. Notebook tags vs SFW/NFSW vs important subjects, etc. Existing alignment can sometimes mess with some of these classifications too which finetuning helps smooth out.

rckrd 1009 days ago

We use a similar trick and expose it via an API. Much easier to parse when you can guarantee the shape of the output

[0] https://thiggle.com/

IanCal 1009 days ago

Last time I used guidance it didn't work even for the examples in the repo, has it matured?

thewataccount 1009 days ago

Eh I had some issues with microsoft's "guidance", eventually got it working with an older version. They haven't updated it in a month too.

lmql might be a decent alternative.

Any form of logit bias should work though.

kcorbitt 1009 days ago

Yeah totally agree. We've found that a ton of OpenAI usage in practice is a variant of either classification or information extraction. This makes sense -- going from a human-native form of information (free text) to a computer-native form of information (structured data) is a key component of many pipelines!

Of course, GPT-4 is insanely expensive to use at scale, and still isn't a perfect classifier. So the next step is to take the outputs you get from GPT-4 and use them to fine-tune a smaller model that's really fast and good at your specific problem. In my experience, even without using any human annotations or online learning, a model fine-tuned just on GPT-4 outputs can actually outperform GPT-4 as a classifier! This seems really counterintuitive at first, but my guess is what's happening is that the training process is a kind of regularization, so the weird mistakes GPT-4 occasionally makes are overwhelmed in the training data by all the times when GPT-4 gets it right.

As a disclaimer, we're building open source tooling to ease the transition from prompt to cheaper fine-tuned model at my company OpenPipe.

sharemywin 1009 days ago

I wonder what the TOS is for a smaller model. I think it's not ok for training another LLM but what point does a Model become an LLM?

gkbrk 1009 days ago

Approximately nobody cares about the TOS of large language models sold for money that were trained by copying the content of everyone else with 0 compensation.

kippinitreal 1009 days ago

Totally agree that this is under appreciated.

Between open source modeling tools being incredible, transfer learning allowing dirt cheap fine-tuning and now mega-models being able to instantly give you a "mostly right" data set, the cost of creating ML features has dropped to almost nothing.

Products that took quarters/years and required big budgets for labeling, ML specialist, GPUs etc just a few years ago can now be done in an hour or so for free (if you are scrappy). I imagine this is going to lead to a ton of great ML features that weren't worth funding in the past but are very valuable in aggregate. Similar to the mid-2000s when the cost/ease of web development came down enough that there was a lot more experimentation and fun to be had.

lamroger 1009 days ago

Agreed! I think it's going to be hard for software developers to adjust to the "data science/engineering" mindset of monitoring and iterating on the long tail of maintenance. A lot of teams already have this issue with deterministic code running in production. I think there's a big opportunity to help purely software teams to learn and adjust.

_ea1k 1009 days ago

Honestly, this was one of the first things that excited me with chatgpt. I'm really eager to see a high performance inference engine that can keep up with my log data.

Being able to teach an AI assistant to look for specific (but not too specific) things with just a prompt would be incredibly helpful.

100k 1009 days ago

Another thing you can do with LLMs that I think is pretty interesting is use them to train a cheaper and faster model. Then use the faster model in your application.

9dev 1009 days ago

We’re doing this pretty successfully to identify products from massive text content, and even more importantly, we then perform a second pass and let the models categorise the identified products, and then do a third pass to build a category hierarchy. This gets us a full product taxonomy with practically no sweat. It’s amazing, really.

dchuk 1009 days ago

Is in this in the context of web scraping? Or like in literal text/prose?

9dev 1009 days ago

Both. Although „prose“ in this case means tightly described cargo manifests.

superb-owl 1009 days ago

I couldn't agree more. I talked a bit about how amazing and magical LLM-based text classification is here: https://blog.superb-owl.link/p/the-shapes-of-stories-with-ch...

emporas 1009 days ago

Nice one. Have you thought of stripping the text of words which do not contribute much to the meaning of a sentence? This way you could squeeze the context window even more.

I have written some stories myself, with the help of GPT, i will try to parse my stories with your method. It is very interesting.

As a side note, GPT is definitely not a toy. I use it for coding, it is great! I use it to write command line apps, which do some simple data manipulation, some more complex than others, but in the order of hundreds of lines of code. They work flawlessly, without me writing even a single line of code.

bxguff 1009 days ago

the LLM ouroboros starts with models being used to create training data for models.

jstarfish 1009 days ago

I'm actually looking forward to this because the result is going to be hilarious-- the culmination of literally every slippery-slope argument ever as the models reinforce their own biases over time.

particlesy 1008 days ago

You can use Particlesy to create a custom GPT-4 bot trained just to classify csv row data and integrate systems.

One interesting use case we see is a SaaS company using the our REST API to access a Particle with custom instructions just for integration with other systems. They will provide a CSV row and the GPT-4 model will classify and map the columns into their key columns. In effect, they are able to integrate with almost any system in their vertical with an out-of-the-box integration. Albeit, it is more expensive, but it is great for the initial trial phase and the costs can be passed to the customer. https://www.particlesy.com

vinay_ys 1009 days ago

I'm waiting for a well-optimized LLM-based system built-into a local editor like obsidian and I can ask it scan my entire local Documents folder and then it supercharge my reading/writing locally.

rckrd 1009 days ago

We've found the same. A lot of usage through our LLM Categorization endpoint. The toughest problem was actually constraining the model to only output valid categories and not hallucinate new ones. And to only return one for single-classification (or multiple if that's the mode).

[0] https://matt-rickard.com/categorization-and-classification-w...

brap 1009 days ago

With everyone talking about LLMs being glorified autocomplete, I actually would like to see how well they perform as autocomplete. Because most built-in ones are pretty bad.

nomel 1009 days ago

I’m assuming this is how whisper works.

For some evidence, say a sentence, then say a sentence with the words scrambled. The performance nose dives, relative to audio levels.

olalonde 1009 days ago

That's essentially what GitHub Copilot is.

brap 1009 days ago

I mean classical autocomplete, the ones we have on our phones that give us one word as a time (on iOs you get 3 options to choose from)

Philpax 1009 days ago

iOS 17 is basically doing that: https://www.macrumors.com/guide/ios-17-keyboard/

sharemywin 1009 days ago

stupid chatgpt...could be the new "stupid autocorrect"

I used to love those posts.

magospietato 1009 days ago

There is a software company in the UK that is performing medical-jargon-to-simple-English translation on text emitted from speech recognition.

They achieve this via a fairly complex set of regular expressions, which must represent a significant time investment to research and maintain.

The same effect can be achieved with a properly prompted GPT-4 completion request, which took about five minutes to write.

ifyoubuildit 1009 days ago

The difference there is that you can probably look at (or design a test for) that hodge podge of regexes and understand the range of outputs.

You can prompt gpt4 and get something that looks plausible for a few test cases with very little effort, but can you get any guarantees that it will behave reasonably for most inputs? And if you can, will those guarantees last as the model is updated underneath you?

earthboundkid 1009 days ago

I would be very worried that the LLM would say something medically wrong, and we'd get sued for a lot of money. ISTM that a better thing to do is to use the LLM to generate a lot of training data that you then test your handwritten super-regex against.

wodenokoto 1009 days ago

The nice thing about active learning with a classic ML model is, everytime you annotate a data point, the model learns.

How do you update your prompt to take the new data point into account? Or do you just add it as an example inside the prompt and let it grow?

gwern 1009 days ago

Yes. You can also do retrieval on your set of classified examples so the prompt only contains the most similar examples, and as your set grows, the prompt becomes more useful. (Even if this is still not as good as finetuning would be.) Note that you can do multiple prompts if you are willing to pay for even more accuracy, to ensemble different prompts.

umutisik 1009 days ago

And very likely this is coming to computer vision too with multi-modal GPT.

thewataccount 1009 days ago

I've already had some success with Llava although existing image classifiers are still going to outperform it.