| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rvnx 519 days ago

https://x.com/arthurmensch/status/1752737462663684344?s=46

This is one message of the founders of Mistral when they accidentally leaked one work-in-progress version that was a fine-tune of LLaMA, and there are few hints for that.

Like:

> What is the architectural difference between Mistral and Llama? HF Mistral seems the same as Llama except for sliding window attention.

So even their “trained from scratch” models like 7B aren’t that impressive if they just pick the dataset and tweak a few parameter.

1 comments

int_19h 518 days ago

Right, so Mistral accidentally released one internal prototype that was fine-tuned LLaMA. How does it follow from there that their other models are the same? Given that the weights are open, we can look, and nope, it's not the same. They don't even use the same vocabulary!

And I have no idea what you mean by "they just pick the dataset". The LLaMA training set is not publicly available - it's open weights, not open source (i.e. not reproducible).

link