Hacker News new | ask | show | jobs
by ghughes 1071 days ago
But given the rumored architecture (MoE) it would make complete sense for them to dynamically scale down the number of models used in the mixture during periods of peak load.