https://proceedings.mlsys.org/paper/2021/hash/3636638817772e...
seems mostly about tuning the original idea instead of expanding its scope. But it's still a neat idea. I guess it could be possible to adpt many of the approximations used in the SLIDE idea to GPUs too though...