| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by IVCrush 1169 days ago

If I'm understanding correctly, the question is, why would you ever need to move off just the base LLMs if they're already fantastic at NLP tasks?

The main reason why you'd want to move from a prompted LLM to a smaller, fine-tuned NLP model (even if it's still an LLM) is usually to save latency and money on compute.

Out-of-the-box, the popular LLMs are pretty great for most NLP tasks. Because of this, you can quickly bootstrap a first version of your NLP applications (text analytics, unstructured data extraction, etc.) using just prompting.

For a lot of these tasks, though, you don't need the full expressive power of the base LLM. So the idea is you take the data you collect from the first prompted version and use it to either fine-tune a smaller LLM, or even a more simple, traditional model.

These smaller models are usually faster and cheaper to run which can save you a lot of money at scale.

2 comments

paulgb 1169 days ago

Thanks for the explanation, this is very interesting! Just so that I’m sure I’m understanding, this doesn’t have to do with what GradientJ currently offers, or does it?

IVCrush 1169 days ago

It does! Though right now we're focused on what teams need to get that first version out the door, ultimately, we want to offer people a platform that lets them manage their NLP app throughout its lifecycle (LLM or otherwise).

Going through that process of idea -> first model -> optimized model is the core "loop" of the LLM lifecycle. The problem is to do that effectively you need to set up the right infrastructure to both aggregate the data going into and coming out of your model AND set up benchmarks to run experiments.

Having this data-eval engine set up is what lets you easily (or even autonomously) evaluate whether it makes sense to switch from that prompted model to a smaller model.

Right now, GradientJ lays some of the rudimentary groundwork for this loop by letting you set up testing for prompt-based LLM models and automatically aggregate the input/output data that goes through your model in production. We've got some basic fine-tuning capabilities, but really we're still working on refining the tools to use that data to evaluate across multiple NLP models (both LLM and non-LLM).

paulgb 1169 days ago

I see, very cool!

artembugara 1169 days ago

They literally got me signing a contract after explaining that haha.

anoy8888 1169 days ago

Thanks for explaining