Y
Hacker News
new
|
ask
|
show
|
jobs
by
deyiao
895 days ago
"The MOE architecture uses 20 times the parameters, is this comparison fair? Can it be compared with a single model that also uses 20 times the parameters?"
1 comments
jiggawatts
895 days ago
It has more parameters, but not all of them are used during inference. They compared models that use equal numbers of parameters.
link