|
|
|
|
|
by zozbot234
125 days ago
|
|
With sparse MoE it's worth running the experts in system RAM since that allows you to transparently use mmap and inactive experts can stay on disk. Of course that's also a slowdown unless you have enough RAM for the full set, but it lets you run much larger models on smaller systems. |
|