| HN Mirror

I agree, training a dense model with the same number of parameters would be much a bigger feat.

Otherwise, as I mentioned elsewhere on this page, we routinely describe the size of the human brain in terms of numbers of synapses (connections), even though they are sparsely activated. Only a small subset of your brain 'lights up' for a given input. Number of parameters (connections) is a perfectly sensible way to measure model size.

Anyway, I expect we will see both much larger sparsely and densely activated models going forward. We live in interesting times :-)