| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chessgecko 826 days ago
	This is the sparsest model thats been put out in a while (maybe ever, kinda forget the shapes of googles old sparse models). This probably wont be a great tradeoff for chat servers, but could be good for local stuff if you have 512GB of ram with your cpu.

3 comments

coder543 826 days ago

It has 480B parameters total, apparently. You would only need 512GB of RAM if you were running at 8-bit. It could probably fit into 256GB at 4-bit, and 4-bit quantization is broadly accepted as a good trade-off these days. Still... that's a lot of memory.

EDIT: This[0] confirms 240GB at 4-bit.

[0]: https://github.com/ggerganov/llama.cpp/issues/6877#issue-226...

link

kaibee 826 days ago

I know quantizing larger models seems to be more forgiving but I’m wondering if that applies less to these extreme-MoE models. It seems to be that it should be more like quantizing a 3B model.

link

coder543 826 days ago

4-bit is fine for models of all sizes, in my experience.

The only reason I personally don’t quantize tiny models very much is because I don’t have to, not because the accuracy gains from running at 8-bit or fp16 are that great. I tried out 4-bit Phi-3 yesterday, and it was just fine.

link

refulgentis 826 days ago

Yeah, and usually GPU RAM, unless you enjoy waiting for a minute for filling the context :(

link

Manabu-eo 826 days ago

The old google's Switch-C transformer [1] had 2048 experts, 1.6T parameters, with only one activated for each layer, so much more sparse. But also severely undertrained as the other models of that era, and thus useless now.

1. https://huggingface.co/google/switch-c-2048

link

imachine1980_ 826 days ago

it performs worst than 8b llama 3 so you probably don't need that much.

link

coder543 826 days ago

Where do you see that? This comparison[0] shows it outperforming Llama-3-8B on 5 out of 6 benchmarks. I'm not going to claim that this model looks incredible, but it's not that easily dismissed for a model that has the compute complexity of a 17B model.

[0]: https://www.snowflake.com/wp-content/uploads/2024/04/table-3...

link