| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cubefox 1128 days ago
	It's very different. We don't know exactly what the model consideres good after fine-tuning (which can lead to surprising cases of misalignment), while the probability that something is the next token in the training distribution is very clear. I don't know how they measure it, but they can apparently measure the "loss" which (I think) says how close the model is to some sort of real probability.

1 comments

brookst 1128 days ago

What I meant was, fine tuning is not substantially different from training. It seems odd to use different words for the resulting systems.

link

cubefox 1128 days ago

But fine-tuning is very different from (pre)training. Pretreating proceeds via unsupervised learning on massive amounts of data and compute, while fine-tuning uses much smaller amounts, with supervised learning (instruction tuning) and reinforcement learning (RLHF, constitutional AI).

link