|
|
|
|
|
by boroboro4
434 days ago
|
|
DeepSeek introduced novel experts training technique which increased experts specialization. For particular given domain their implementation tends to activate same experts between different tokens, which is kinda what you’re asking for! |
|