I am skeptical of generic sparsification efforts. After all, companies like Neural Magic spent years trying to make it work, only to pivot to 'vLLM' engine and be sold to Red Hat
Link shows this isn't sparsity as in inference speed, it's spare autoencoders, as in interpreting the features in an LLM (SAE anthropic as a search term will explain more)