| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by boto3 1242 days ago
	It did, actually. The model was trained with multiple rounds of reinforcement learning where human judges provided the feedback: first with full answers, and then with ranking of answers as most relevant. So the model in production is probably frozen, but before that it went through multiple rounds of interaction with the world.

1 comments

jmugan 1242 days ago

The reinforcement learning was on giving the right answer, not on interacting with the world. But there is movement in the right direction with https://ai.googleblog.com/2022/12/rt-1-robotics-transformer-... and other RL stuff. (RT-1 isn't RL but there is other related stuff that is)

boto3 1242 days ago

Oh, you meant interaction as a joint training with images, actions, feedback etc. That would be the next generation I guess.

I am simply thinking of interaction here as similar to learning a language in a classroom. First the teacher provides sample questions/answers, then the teacher asks the students to come up with answers themselves, and tell them which one is better. The end result here is I think ChatGPT is quite good at answering questions and can pass as a human, especially if it's augmented with a fact database, so obviously wrong answers can be pruned.