Hacker News new | ask | show | jobs
by thanatosmin 920 days ago
It looks like this is still missing many matrix operations like QR, SVD, einsum, etc. Is there a clear route to using these on the GPU in Python on Apple Silicon? Last I checked the PyTorch backend was still missing at least QR...
1 comments

factorization methods are somewhat uncommonly used in deep learning (the likely target of this framework) and have compute properties (such as approximate outputs, non-deterministic number of iterations) that make them unlike the BLAS++ standard APIs.

einsum seems like a reasonable thing to request, but it's hard to be performant across the entire surface exposed by the operation.

Exactly right that this targets a narrower surface to enable many deep learning models. I wonder how uncommon it is to hit some operation that is not included, though? It seems pretty common from a PyTorch MPS tracking issue:

https://github.com/pytorch/pytorch/issues/77764

NVIDIA's moat is not just in providing BLAS++ operations, but extending this to a wider range of cuSPARSE, cuSOLVE, cuTENSOR, etc. Without these, it feels like Apple is just trying to play catch up with whatever is popular and unsupported...