|
|
|
|
|
by stonebraker
886 days ago
|
|
I guess for self-evaluation and generation, we might want to choose a model that's performant for the job. This means that if the 70B is fine-tuned, that is probably the judge + augmentor vs a generic model.
Also, I think the paper shows the win rate using the Mistral medium on some preliminary benchmark (Table 2) But, I liked the idea that the reward model is not static, and if the user is provided with multiple options, then the extra score might help break the tie. |
|