Hacker News new | ask | show | jobs
by deyiao 895 days ago
"The MOE architecture uses 20 times the parameters, is this comparison fair? Can it be compared with a single model that also uses 20 times the parameters?"
1 comments

It has more parameters, but not all of them are used during inference. They compared models that use equal numbers of parameters.