Y
Hacker News
new
|
ask
|
show
|
jobs
by
gunalx
98 days ago
Probably just means SFT fine-tuning a base model, vs behavioural dpo and/or SFT fine-tuning a instruction model.