| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stonebraker 886 days ago
	I guess for self-evaluation and generation, we might want to choose a model that's performant for the job. This means that if the 70B is fine-tuned, that is probably the judge + augmentor vs a generic model. Also, I think the paper shows the win rate using the Mistral medium on some preliminary benchmark (Table 2) But, I liked the idea that the reward model is not static, and if the user is provided with multiple options, then the extra score might help break the tie.