|
|
|
|
|
by viraptor
335 days ago
|
|
How well separated are experts per domain in a model like that? Specifically, if I'm interested in a programming use only, could we possibly strip it to one or two of them? Or should I assume a much wider spread? (And there would be some overlap anyway from the original root model) |
|
See https://github.com/peteryuqin/Kimi-K2-Mini, a project that keeps a small portion of experts and layers and keep the model capabilities across multiple domains.