Y
Hacker News
new
|
ask
|
show
|
jobs
by
superjan
87 days ago
That was a very good summary. One detail the post could use is mentioning that 4 or 10 experts invoked where selected from the 512 experts the model has per layer (to give an idea of the savings).