Hacker News new | ask | show | jobs
by perbu 59 days ago
MoE is excellent for the unified memory inference hardware like DGX Sparc, Apple Studio, etc. Large memory size means you can have quite a few B's and the smaller experts keeps those tokens flowing fast.