|
|
|
|
|
by brrrrrm
920 days ago
|
|
factorization methods are somewhat uncommonly used in deep learning (the likely target of this framework) and have compute properties (such as approximate outputs, non-deterministic number of iterations) that make them unlike the BLAS++ standard APIs. einsum seems like a reasonable thing to request, but it's hard to be performant across the entire surface exposed by the operation. |
|
https://github.com/pytorch/pytorch/issues/77764
NVIDIA's moat is not just in providing BLAS++ operations, but extending this to a wider range of cuSPARSE, cuSOLVE, cuTENSOR, etc. Without these, it feels like Apple is just trying to play catch up with whatever is popular and unsupported...