|
|
|
|
|
by Dzugaru
1179 days ago
|
|
It was already trained like this, RLHF [0] is a loop that has ways to feed it the information to "move forward" - update its weights. It was fine-tuned like this. It certainly did ignite something - maybe sparks of AGI, maybe just sparks of pretending to be more liked by humans (hence Human Feedback in RLHF). My guess is that loops by itself (Reflexion [1], etc.) won't change much. ChatGPT massive logs - maybe. [0] https://huggingface.co/blog/rlhf [1] https://arxiv.org/abs/2303.11366 |
|