Hacker News new | ask | show | jobs
by simonw 463 days ago
I've had trouble getting a great answer to this question - I ask it in various places every month or so, most recently here: https://nitter.net/simonw/status/1895301139819860202

On paper fine tuning smaller models can greatly reduce the cost for a specific task, but I've not heard many real-world success stories around that.

I think vision LLMs are one of the most interesting applications here - things like fine-tuning for better results extracting data from a specific paper form or report structure. Again, not many public examples of that.

3 comments

Oh there's a lot! Some cool examples I see:

1. Codebases, docs, large corpses of internal datasets - fill in the middle, auto completion etc.

2. I know a tonne of financial institutions use fine-tuning for trading, real time data parsing headline analysis, signal creation etc

3. Distillation is also relatively common - taking outputs of a large model and distilling it to a small model

4. Accuracy increasing is the most important - not cost or latency - we find if you solve the finetuning life cycle ie continuous auto fine-tuning, data filtering, reinforcement learning via DPO, that works well!

5. Lots of organizations use DPO and preference fine-tuning to align models since they have tonnes of feedback data!

6. Yep vision fine-tuning! For eg medical diagnosis, docs, qa on pics etc

7. And obviously large model labs finetune all base models ie chatgpt4.5 is a finetune of a base model

8. Finally reasoning finetuning via GRPO is very cool! If you have inputs and outputs but no labelled cot in between, GRPO is the way to go! Custom reward functions by companies!

"Codebases, docs, large corpses of internal datasets"

I still haven't seen a convincing demo of using fine-tuning to "teach" a model new information from additional documents. I'd love to see one.

(Closest I've come to that is I heard a rumor that Jane Street have fine-tuned an LLM for OCaml)

Here is a small LLM I trained to output dollars and cents from a verbal numeric amount:

https://huggingface.co/TrevorJS/check-amount-deverbalizer-sm...

Vision LLMs are definitely an interesting application.

At Avy.ai we're running small (2B-7B, quantized) vision models as part of a Mac desktop application for understanding what someone is working on in the moment, to offer them related information and actions.

We found that the raw results in understanding the images with a light LORA fine tune are not substantially different -- but the ease of getting a small model to follow instructions in outputting structured data in response to the image and at the level of verbosity and detail we need is greatly enhanced with fine tuning. Without fine tuning the models on the smaller end of that scale would be much more difficult to use, not reliably producing output that matched what the consuming application expects

Was constrained decoding not enough to force the output to be in a specific format?
Using a grammar to force decoding say valid JSON would work, but that hasn't always been available in the implementations we've been using (like MLX). Solvable by software engineering and adding that to the decoders in those frameworks, but fine tuning has been effective without that work.

The bigger thing though was getting the models to have the appropriate levels of verbosity and detail in their ouput which fine tuning made more consistent.

We use multiple post-trained models in production, at scale at https://osmos.io
Have you published details of how you're doing that anywhere?

Could be a useful marketing strategy for you, given how starved we all are of information about successful fine tuning stories.

Things have been moving so fast that it’s honestly hard for a small team to do that in parallel.

I got to present at GCP Next about a part of this last year: https://www.youtube.com/watch?v=5QsM1K9ahtw

I’m presenting in one (and maybe two) sessions with more info on the training side this year.