Hacker News new | ask | show | jobs
by p1esk 261 days ago
You test different models on your real world problem, and pick the smallest one that works.
1 comments

I just think that there has to be some heuristic..
Closest thing to a heuristic is trying the task with non fine-tuned models and building an intuition for how far off each model is, what directions it's off in, and how easily you can improve that direction via fine-tuning.

For example, for classification, if is hallucinating semantically similar, but not technically valid classes, you can probably fine-tune your way out of the gap with a smaller model.

But if your task requires world knowledge, you likely need a larger model. It's not cheap, efficient, or generally useful to fine-tune for additional world knowledge directly.