|
|
|
|
|
by TotalCrackpot
919 days ago
|
|
Btw, shouldn't it in theory be possible to run the Mixtral MoE loading next submodel sequentially and store outputs and then do the rest of the algorithm to make it easier to run on machines that cannot fit whole model in the memory? |
|