Hacker News new | ask | show | jobs
by Snoozus 87 days ago
Unfortunately no, experts are typically switched out for every token. The way I understand it the idea was something like having each expert be good at one kind of task, but that's not how it panned out after training.