|
|
|
|
|
by cosmojg
929 days ago
|
|
What was the point of moving away from the base model? I can't stop asking this question. Conversational formatting is achievable with careful prompting and a bit of good old-fashioned heuristic post-processing, and it was easier to achieve consistent results before RLHF took off. Now we still have to do a bunch of prompt hacking to get the results we want[1], but it's more complicated and the performance of the model has degraded significantly[2]. All the cargo culting toward agentic chatbots and away from language prediction engines might please the marketing and investor relations departments, but it's only setting us back in the long run. [1] https://arxiv.org/pdf/2310.06452.pdf [2] https://arxiv.org/pdf/2305.14975.pdf |
|
The reward models are kind of forgotten by everyone, but they are substantial transformer models with billions of parameters themselves. I think companies are using RLHF because it really helps align preferences/steer/improve performance.