| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by skissane 1133 days ago

OpenAI could make it easy to answer this question, if they provided access to different checkpoints in their model for comparison:

(1) the foundation model (before any RLHF)

(2) RLHF for instruction-following – but not for "safety" or "truthfulness"

(3) RLHF for "safety" and "truthfulness"

But, I don't believe OpenAI gives public access to (1) or (2), only to (3).

I'm also wondering if they maybe they intentionally don't want for it to be easy for people to answer this question.