Hacker News new | ask | show | jobs
by CuriouslyC 846 days ago
If Mixtral isn't outperforming chatgpt 3 you're configuring it wrong. It gives somewhat terse answers by default, but you can prompt it to spit out wordy answers of the sort chatgpt has been aligned to prefer easily enough.
1 comments

Mixtral aka the 8x7B the "sparse mixture of experts" one is not the same as, eg. Mistral-7B which is still very, very good, just not quite hitting the mark on some things.

I still couldn't run Mixtral 8x7B on an M1 Macbook Pro with 32Gb ram, so maybe I am indeed doing it wrong? Or are there better quantized versions available now or..?

32 isn't quite enough to run a decent quant of mixtral(on a Macbook). You could try a Q3_K_M, but not sure how lobotomized it would be.
That's not true, the GGUF quants aren't great but there are exl2 4bit quantizations floating around that are pretty sweet.
exl2 is Nvidia/AMD only.

But GGUF Mixtral should fit in 32GB... just not with the full 32K context. Long context is very memory intense in llama.cpp, at least until they fully implement flash attention and a quantized cache.

fair enough, yeah I'm talking about GGUF quants only.