Hacker News new | ask | show | jobs
by superjan 87 days ago
That was a very good summary. One detail the post could use is mentioning that 4 or 10 experts invoked where selected from the 512 experts the model has per layer (to give an idea of the savings).