DPO fine-tuned Mistral 7B beats Llama 70B on MT Bench

Y	Hacker News new \| ask \| show \| jobs

	DPO fine-tuned Mistral 7B beats Llama 70B on MT Bench (huggingface.co)
	3 points by clmnt 990 days ago

2 comments

amilios 990 days ago

Can anyone corroborate this anecdotally? I.e. has anyone actually looked at the output of the two models side-by-side for common tasks? There's lots of talks these days about academic benchmarks being pretty "broken" for modern LMs, and not really properly showcasing the differences between models. I wonder if that's the case here or if the model is genuinely better.

link

brucethemoose2 989 days ago

I dunno. I totally missed this and will check it out.

- Huggingface is less likely to "cheat" by training on tests than other orgs, I think.

- Some finetunes are really good at a particular test (like XWin). This isnt necessarily a bad thing, if they are good at a specific niche.

link

brucethemoose2 989 days ago

> <|system|>, <|user|> and <|model|>.

Oh hey, thats almost Metharme's format.

It must originate from an older model, as most new models dont use that syntax.

link