|
|
|
|
|
by skissane
1133 days ago
|
|
OpenAI could make it easy to answer this question, if they provided access to different checkpoints in their model for comparison: (1) the foundation model (before any RLHF) (2) RLHF for instruction-following – but not for "safety" or "truthfulness" (3) RLHF for "safety" and "truthfulness" But, I don't believe OpenAI gives public access to (1) or (2), only to (3). I'm also wondering if they maybe they intentionally don't want for it to be easy for people to answer this question. |
|