| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by huevosabio 1114 days ago
	Any idea how hippo, AI Template and TVM compare in performance?

1 comments

antinucleon 1114 days ago

Hippo is faster than AITemplate, and supports more generative models. We haven't compared vs TVM, but for absolute token/s on M2 Max, Hippo is able to run decoding on LLAMA with datacenter level GPUs performance (with other SW).

link

philipturner 1112 days ago

The difference in bandwidth between M2 Max and data centers GPUs isn't that much (less than a factor of 5). The difference in compute is much, much larger. If you only have fast GEMV kernels, and not fast GEMM kernels, you're basically locked into an inference engine that can only run GPT-style transformers efficiently. However, it can *technically* support all the models, but at what ALU utilization?

link

huevosabio 1114 days ago

Thanks, I've added myself to the waitlist. Please let us know when this can be tried!

link

choppaface 1113 days ago

How does Hippo compare with TensorRT?

link