Hacker News new | ask | show | jobs
by reissbaker 656 days ago
It's totally task dependent. On some tasks, large well-trained models are already great; if the large model already exhibits human-level performance, a small-model finetune is unlikely to beat it. Similarly, on very general tasks (e.g. "coding" as a general task, as opposed to "writing idiomatic NextJS" being a specific task), a small-model finetune will be unlikely to beat a large model.

But there are plenty of tasks that even large, well-trained models struggle with. If the OP is struggling to get useful root-cause analysis for cloud service incidents out of an existing large model, that seems exactly like a use case where a finetune would shine.

Also, finetunes don't have to be just for small models! Medium-sized models like Llama-3.1-70b can be finetuned, and if you want to burn a lot of GPUs you can finetune 405b as well.

1 comments

Yes I was specifically thinking of the latter with larger models, I think many many shot ICL still tends to outperform but you're right it's worth trying both for your use case