|
|
|
|
|
by rvnx
519 days ago
|
|
https://x.com/arthurmensch/status/1752737462663684344?s=46 This is one message of the founders of Mistral when they accidentally leaked one work-in-progress version that was a fine-tune of LLaMA, and there are few hints for that. Like: > What is the architectural difference between Mistral and Llama? HF Mistral seems the same as Llama except for sliding window attention. So even their “trained from scratch” models like 7B aren’t that impressive if they just pick the dataset and tweak a few parameter. |
|
And I have no idea what you mean by "they just pick the dataset". The LLaMA training set is not publicly available - it's open weights, not open source (i.e. not reproducible).