|
|
|
|
|
by zozbot234
81 days ago
|
|
The writeup from the earlier experiment (running on a MacBook Pro) shows quite clearly that expert routing choices are far from uniform, and that some layer-experts are only used rarely. So you can save some RAM footprint even while swapping quite rarely. |
|
When the individual expert sizes are similar to the entire size of the RAM on the device, that's your only option.