Hippo is faster than AITemplate, and supports more generative models. We haven't compared vs TVM, but for absolute token/s on M2 Max, Hippo is able to run decoding on LLAMA with datacenter level GPUs performance (with other SW).
The difference in bandwidth between M2 Max and data centers GPUs isn't that much (less than a factor of 5). The difference in compute is much, much larger. If you only have fast GEMV kernels, and not fast GEMM kernels, you're basically locked into an inference engine that can only run GPT-style transformers efficiently. However, it can *technically* support all the models, but at what ALU utilization?