|
|
|
|
|
by anon373839
919 days ago
|
|
No, I am referring to the 7Bx8 MoE model. The MoE layers apparently can be sparsified (or equivalently, quantized down to a single bit per weight) with minimal loss of quality. Inference on a quantized model is faster, not slower. However, I have no idea how practical it is to run a LLM on a phone. I think it would run hot and waste the battery. |
|