|
|
|
|
|
by dontwearitout
681 days ago
|
|
Claude notably does not use RLHF, but uses RLAIF, using a LLM to generate the preferences based a "constitution" instead of human preferences. It's remarkable that it can bootstrap itself up to such high quality. See https://arxiv.org/pdf/2212.08073 for more. |
|
https://www.surgehq.ai/case-studies/anthropic-claude-surgeai...