| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CuriouslyC 846 days ago
	If Mixtral isn't outperforming chatgpt 3 you're configuring it wrong. It gives somewhat terse answers by default, but you can prompt it to spit out wordy answers of the sort chatgpt has been aligned to prefer easily enough.

1 comments

brunooliv 846 days ago

Mixtral aka the 8x7B the "sparse mixture of experts" one is not the same as, eg. Mistral-7B which is still very, very good, just not quite hitting the mark on some things.

I still couldn't run Mixtral 8x7B on an M1 Macbook Pro with 32Gb ram, so maybe I am indeed doing it wrong? Or are there better quantized versions available now or..?

link

d-z-m 846 days ago

32 isn't quite enough to run a decent quant of mixtral(on a Macbook). You could try a Q3_K_M, but not sure how lobotomized it would be.

link

CuriouslyC 846 days ago

That's not true, the GGUF quants aren't great but there are exl2 4bit quantizations floating around that are pretty sweet.

link

brucethemoose2 846 days ago

exl2 is Nvidia/AMD only.

But GGUF Mixtral should fit in 32GB... just not with the full 32K context. Long context is very memory intense in llama.cpp, at least until they fully implement flash attention and a quantized cache.

link

d-z-m 846 days ago

fair enough, yeah I'm talking about GGUF quants only.

link