Cool method. Pre deep learning there was plenty of interesting research on sparse methods. What do you think we're missing to have more widely used neural+sparse approaches?
I think the lack of efficient GPU kernels was the main problem. It is much, much easier to get a real speedup and memory reduction from quantization from fp16 to fp8 than from 50% sparsity. For sparsity you needed structure (which makes your model worse) and special hardware support.