|
|
|
|
|
by miven
162 days ago
|
|
I'm really glad that these HNet-inspired approaches are getting traction, I'm a big fan of that paper. Though I wonder how much of the gains in this case are actually due to 75% extra parameters compared to the baseline, even if the inference FLOPs are matched. Can't help but see this as a just different twist on parameter use sparsity idea leveraged by MoE models, as those also gain in performance at constant forward pass FLOPs because of extra parameters. |
|