|
|
|
|
|
by panarky
398 days ago
|
|
The shared reward function from pre-training is like primary school for an LLM. Maybe RLHF is like secondary school. The governor can be differentiated from the workers with different system and user prompts, fine tuning, etc., which might be similar to medical school or law school for a human. Certainly human judges, attorneys for defense and prosecution, and members of the jury can still perform their jobs well even if they attended the same primary and secondary schools. |
|