|
|
|
|
|
by coder543
62 days ago
|
|
Not true. With a MoE, you can offload quite a bit of the model to CPU without losing a ton of performance. 16GB should be fine to run the 4-bit (or larger) model at speeds that are decent. The --n-cpu-moe parameter is the key one on llama-server, if you're not just using -fit on. |
|