|
|
|
|
|
by bugglebeetle
102 days ago
|
|
Unfortunately, this looks to only cover the larger MoE models. I imagine the smaller models are what most people would target. 9B just dropped two days ago, so not surprised it’s not explicitly documented, but does use a hybrid mamba architecture that I expect needs some special consideration. |
|