Hacker News new | ask | show | jobs
by mrfakename 856 days ago
Note that it's actually "Mistral Next" not "Mixtral Next" - so it isn't necessarily a MoE. For example, an early version of Mistral Medium (Miqu) was not a MoE but instead a Llama 70B model. I wonder how many parameters this one has
1 comments

I know what they were going for with the Mixtral name but every time I come across it I wonder if they considered just how easily the two might be confused. It seems like a poor branding decision - what if some expected the Mixtral performance but accidentally uses a Mistral model? What if someone wants the low resource usage of e.g. Mistral 7B but tries out Mixtral 8x7B instead? It's especially hard when your colleagues aren't necessarily native English speakers.

There's got to be a better name for such a cool product. Maybe MistralX? MistMix?

I feel like this is not really an issue. I personally lost track of all the llamas, <not>gpts, etc - but if somebody is going to seriously use a certain model, they'll find out soon enough if they're using the wrong one.
It has definitely affected myself and colleagues, perhaps we didn't waste much time but it's annoying. Even if it isn't a problem, it really cannot hurt to make the naming easier to understand.
I agree. I also think the Llama naming was confusing - versioning by capitalization? (LLaMA vs Llama)