|
|
|
|
|
by endymi0n
912 days ago
|
|
There’s a brand new hybrid quantization for Mixtral out that uses 4b for shared neurons and 2b for experts, which does not bleed much perplexity, but fits it into a 32G machine. Haven’t had it in hand yet and no link here on mobile, but can’t wait to try. |
|