| HN Mirror

Exactly right that this targets a narrower surface to enable many deep learning models. I wonder how uncommon it is to hit some operation that is not included, though? It seems pretty common from a PyTorch MPS tracking issue:

https://github.com/pytorch/pytorch/issues/77764

NVIDIA's moat is not just in providing BLAS++ operations, but extending this to a wider range of cuSPARSE, cuSOLVE, cuTENSOR, etc. Without these, it feels like Apple is just trying to play catch up with whatever is popular and unsupported...