| tl;dr: Meta is open sourcing AITemplate, an inference engine for both Nvidia and AMD GPUs. Code: https://github.com/facebookincubator/AITemplate. AITemplate delivers much better perf (1.9x ~ 12.8x) compared to PyTorch eager on SOTA models, including Bert, ResNet, VIT and StableDiffusion. AITemplate also delivers high perf numbers using AMD GPUs (MI-250). With AITemplate, MI-250 achieves 80% ~ 96% A100 perf on various ResNet / Bert / VIT models. AITemplate uses sophisticated fusion techniques to optimize perf, including vertical, horizontal, and memory fusions. btw, I'm one of the authors of AITemplate, happy to answer any questions. |
Edit: link for TVM https://tvm.apache.org/