Hacker News new | ask | show | jobs
DeepSeekMoE: Expert Specialization in Mixture-of-Experts Language Models (arxiv.org)
1 points by tildef 878 days ago