Y
Hacker News
new
|
ask
|
show
|
jobs
by
functional_dev
69 days ago
This confused me at first as well.. inactive experts skip compute, but weights are sill loaded. So memory does not shrink at all.
I found this visualisation helpful -
https://vectree.io/c/sparse-activation-patterns-and-memory-e...