|
|
|
|
|
by verdverm
1130 days ago
|
|
RLHF entails f for ChatGPT because you cannot get ChatGPT without RLHF. This is before the feedback from users, and was part of the initial training process. Without RLFH, you only have GPT-3 which is very different from its successors that people are actually enthralled with and worried about. RLHF is Reinforcement Learning from Human Feedback Thus you cannot get ChatGPT without humans in the loop, making it quite sensitive and irreproducible. |
|