| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by solresol 415 days ago
	I think of it as trying to encourage the LLM to want to give answers from a particular part of the phase space. You can do it by fine tuning it to be more likely to return values from there, or you can prompt it to get into that part of the phase space. Either works, but fiddling around with prompts doesn't require all that much MLops or compute power. That said, fine tuning small models because you have to power through vast amounts of data where a larger model might be cost ineffective -- that's completely sensible, and not really mentioned in the article.

2 comments

lyu07282 415 days ago

> That said, fine tuning small models

Mostly referred to as model distillation, but I give the author the benefit of the doubt that they didn't mean that.

link

sota_pop 415 days ago

My understanding of model distillation is quite different in that it trains another (typically smaller) model using the error between the new model’s output and that of the existing - effectively capturing the existing model’s embedded knowledge and encoding it (ideally more densely) into the new.

link

lyu07282 415 days ago

What what I was referring to is similar in concept, but I've seen both described in papers as distillation. What I meant was you take the output of a large model like GPT4 and use that as training data to fine-tune a smaller model.

link

sota_pop 411 days ago

Yes, that does sound very similar. To my knowledge, isn’t that (effectively) how the latest DeepSeek breakthroughs were made? (i.e. by leveraging chatgpt outputs to provide feedback for training the likes of R1)

link

cbsmith 415 days ago

> That said, fine tuning small models because you have to power through vast amounts of data where a larger model might be cost ineffective -- that's completely sensible, and not really mentioned in the article.

...which I thought was arguably the most popular use case for fine tuning these days.

link