Hacker News new | ask | show | jobs
by boroboro4 434 days ago
DeepSeek introduced novel experts training technique which increased experts specialization. For particular given domain their implementation tends to activate same experts between different tokens, which is kinda what you’re asking for!