|
|
|
|
|
by behnamoh
529 days ago
|
|
This was discussed in my paper last year: https://arxiv.org/abs/2406.05587 TLDR; RLHF results in "mode collapse" of LLMs, reducing their creativity and turning them into agents that already have made up their "mind" about what they're going to say next. |
|