|
|
|
|
|
by lagrange77
106 days ago
|
|
> There is a nontrivial amount of RL training (RLHF, RLVR, ...), so it would be reasonable to call it an RL model. Hm, as i understand it, parts of the training of e.g. ChatGPT could be called RL models. But the subject to be trained/fine tuned is still a seq2seq next token predictor transformer neural net. |
|