|
|
|
|
|
by lagrange77
938 days ago
|
|
ChatGPT was trained (in an additional step to supervised learning of the base LLM) with reinforcement learning from human feedback (RLHF) where some contractors were presented with two LLM output to the same prompt and they had to decide, which one is better. This was a core ingredient to the performance of the system. |
|