| HN Mirror

> The only potential benefit

Other benefits:

1. Significantly lower dimensionality of internal representations 2. More interpretable (see: https://toponets.github.io)

> 7B model down to 6B

We remove ~80% of the parameters in topographic layers and retain the same performance in the model. The drop in parameter count is not significant because we did not experiment with applying TopoLoss in all of the layers of the model (did not align with the goal of the paper)

We are currently performing those strong sparsity experiments internally, and the results look very promising!