|
|
|
|
|
by boto3
1242 days ago
|
|
It did, actually. The model was trained with multiple rounds of reinforcement learning where human judges provided the feedback: first with full answers, and then with ranking of answers as most relevant. So the model in production is probably frozen, but before that it went through multiple rounds of interaction with the world. |
|