Hacker News new | ask | show | jobs
by Narew 1361 days ago
How did AITemplate performance to state of art inference engine like tvm or onnx runtime ? Did AITemplate optimize/quantify network?

Edit: link for TVM https://tvm.apache.org/

2 comments

AITemplate only supports fp16 data types with fp16 or fp32 accumulation right now. We are working on supporting more data types and quantization.

We don't have an official comparison between AITemplate and tvm / onnx for now, but we do have perf numbers like https://github.com/facebookincubator/AITemplate/tree/main/ex..., https://github.com/facebookincubator/AITemplate/tree/main/ex.... Feel free to run these examples on other frameworks and compare perf.

I'd love to hear about this too: especially after running the model through an onnx optimizer, like this one [0].

[0] https://github.com/daquexian/onnx-simplifier