|
|
|
|
|
by behnamoh
929 days ago
|
|
I think the current approach — train 7b models and then do MoE on them — is the future. It’ll still be only runnable on high end customer devices. As for 13b + MoE, I don’t think any customer device could handle that in the next couple years. |
|
These aren't totally common configurations, but they're not totally out of reach like buying an H100 for personal use.