|
|
|
|
|
by Eisenstein
783 days ago
|
|
"The quality of the prompts used in SFT and the preference rankings used in PPO and DPO played a crucial role in the performance of the aligned models. Meta's team carefully curated this data and performed multiple rounds of quality assurance on annotations provided by human annotators." * https://www.unite.ai/everything-you-need-to-know-about-llama... |
|