|
|
|
|
|
by Lockal
748 days ago
|
|
I think you massively underestimate the complexity of pytorch. Even if we exclude all GPUs except for AMD, and exclude clang (required for AOT engine), pytorch depends on almost every ROCm library. And inside it depends on original Triton library, and on forked Triton, and on aotriton, which depends on forked MLIR (because AMD MLIR don't contribute these changes to upstream), which depends on another forked LLVM/Clang (because LLVM api is not stable enough for them, I guess). And then there is MIOpen/rocBLAS/hipBLASlt/hipSOLVER/rocFFT/etc - libraries with gigabytes (!) of autogenerated code. Additionally, there are dozens of smaller linked libraries like oneDNN, LIBXSMM, magma, numpy, openBLAS, all needed for running "things". So even without autogenerated code, consider multiplying 1.5 million LOC to 100. |
|