|
|
|
|
|
by ajb117
1117 days ago
|
|
My guess is that it's because they've already done RLHF on top of the standard next token prediction. In other words, they can't cheaply fine tune ChatGPT without undoing the RLHF objective by training on next token prediction with post-2021 data, and then retraining with RLHF to make sure it still gives good human-like output. I mention the "undoing RLHF" since it's not uncommon for fine-tuned models to increase in error in the original training objective after being fine-tuned with a different one. I think people saw this happen in BERT. Also ChatGPT is almost certainly huge. |
|