Hacker News new | ask | show | jobs
by Alifatisk 300 days ago
I think it's because of a combination between the MoE model architecture and the inference done in large batches and run in parallel