| In other llm news, Mistral/Yi finetunes trained with a new (still undocumented) technique called "neural alignment" are blasting other models in the HF leaderboard. The 7B is "beating" most 70Bs. The 34B in testing seems... Very good: https://huggingface.co/fblgit/una-xaberius-34b-v1beta https://huggingface.co/fblgit/una-cybertron-7b-v2-bf16 I mention this because it could theoretically be applied to Mistral Moe. If the uplift is the same as regular Mistral 7B, and Mistral Moe is good, the end result is a scary good model. This might be an inflection point where desktop-runnable OSS is really breathing down GPT-4's neck. |
I asked around a bit about the example and it was strangely coherent and focused across the whole conversation. It was really well detecting, where I'm starting a new thread (without clearing a context) or referring to things before.
It caught me off guard as well with this:
> me: What does following mean [content of the docker compose]
> cybertron-7b: In the provided YAML configuration, "following" refers to specifying dependencies
I've never seen any model using my exact wording in quotes in conversation like that.