|
|
|
|
|
by visarga
1043 days ago
|
|
Not necessarily. For example Anthropic's ConstitutionalAI (CAI) leverages the model to substitute human judgments in RLHF, effectuating essentially RLAIF. CAI information is used to fine-tune the Claude model. Broadly speaking, you require statistics at echelon N+1 when you are at rung N. We can amplify models by providing them additional time, self-reflexion, demand step by step planning, allow external tools, tune it on human preferences, or give it feedback from executing a code, or from a robot. |
|
Maybe working up a proof and then quizzing yourself on it?
As long as we get >N supervision and the difference is more than the model retrograde, it seems that could work. But it seems like there is a definite limit to that. The N-n1 difference will only stay above the improvement delta up to a point.