|
|
|
|
|
by jawerty
1058 days ago
|
|
So I'm not doing RLHF that's how LLama is pre-trained. It's in the loss/optimization phase in their training I believe. For the finetuning i'm using LoRA to freeze most of the layers for parameter optimization. Using PEFT from huggingface |
|