Hacker News new | ask | show | jobs
by fblgit 929 days ago
Correct. UNA can align the MoE at multiple layers, experts, nearly any part of the neural network I would say. Xaberius 34B v1 "BETA".. is the king, and its just that.. the beta. I'll be focusing on the Mixtral, its a christmas gift.. modular in that way, thanks for the lab @mistral!
2 comments

Do a Yi 200K version as well! That would make my Christmas, as Mistral Moe is only maybe 32K.
Do you have any docs describing the method?