|
|
|
|
|
by dartos
621 days ago
|
|
Sure, hand wave away my entire comment as “nonsense” and ignore how statistics works. Training a model on synthetic data (obviously) increases bias present in the initial dataset[1], making for poor training data. IIRC (this subject is a little fuzzy for me) using synthetic data for RLHF is equivalent to just using dpo, so if they did RLHF it probably wasn’t with synthetic data. They may have gone with dpo, though. [1] https://arxiv.org/html/2403.07857v1 |
|
Researchers are using synthetic data to train LLMs, especially for fine tuning, and especially instruct fine tuning. You are not up to date with recent work on LLMs.