| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jawerty 1058 days ago
	So I'm not doing RLHF that's how LLama is pre-trained. It's in the loss/optimization phase in their training I believe. For the finetuning i'm using LoRA to freeze most of the layers for parameter optimization. Using PEFT from huggingface

1 comments

hallqv 1057 days ago

RLHF is not part of LLaMa pretraining, or pretraning of any other models for that matter. RLHF comes after pretraining. https://twitter.com/Jeande_d/status/1661833563069620247/phot...

link

jsmith45 1057 days ago

Seems like a classic case of a term of art overlapping with normal English terminology.

Knowing that you will be doing further training on a provided model (even "just" extensive fine-tuning), one would want to distinguish the training done before you get your hands on it, from the training you do. An obvious word for that previous training is pre-training, which unfortunately conflicts with a term of art.

link

jawerty 1057 days ago

I see, that’s my misunderstanding I was grouping all training as pretraining

link

wilhelm____ 1057 days ago

pre-training is developing the language model's base understanding of conditional word probabilities.

SFT and RLHF is attempting to further guide the model in terms of steerability + alignment of output.

In fact, the InstructGPT authors were worried about losing the pre-trained model's underlying probability distribution, so they try a version where it penalizes the model deviating too significantly from the original distribution (using KL). I don't remember them seeing a significant difference in performance.

link