Hacker News new | ask | show | jobs
by rahimnathwani 513 days ago

  If you place experts in different GPUs
Right, this is described in the Deepseek V3 paper (section 3.4 on pages 18-20).