|
|
|
|
|
by aidan_mclau
735 days ago
|
|
Hey! Essay author here. >The cool thing about using modern LLMs as an eval/policy model is that their RLHF propagates throughout the search. >Moreover, if search techniques work on the token level (likely), their thoughts are perfectly interpretable. I suspect a search world is substantially more alignment-friendly than a large model world.
Let me know your thoughts! |
|
Mobile Safari, phone set to french.