| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lagrange77 106 days ago
	> There is a nontrivial amount of RL training (RLHF, RLVR, ...), so it would be reasonable to call it an RL model. Hm, as i understand it, parts of the training of e.g. ChatGPT could be called RL models. But the subject to be trained/fine tuned is still a seq2seq next token predictor transformer neural net.

1 comments

hexaga 105 days ago

RL is simply a broad category of training methods. It's not really an architecture per se: modern GPTs are trained first on reconstruction objective on massive text corpora (the 'large language' part), then on various RL objectives +/- more post-training depending on which lab.

link