| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by antinucleon 1114 days ago
	Hippo is faster than AITemplate, and supports more generative models. We haven't compared vs TVM, but for absolute token/s on M2 Max, Hippo is able to run decoding on LLAMA with datacenter level GPUs performance (with other SW).

3 comments

philipturner 1112 days ago

The difference in bandwidth between M2 Max and data centers GPUs isn't that much (less than a factor of 5). The difference in compute is much, much larger. If you only have fast GEMV kernels, and not fast GEMM kernels, you're basically locked into an inference engine that can only run GPT-style transformers efficiently. However, it can *technically* support all the models, but at what ALU utilization?

link

huevosabio 1114 days ago

Thanks, I've added myself to the waitlist. Please let us know when this can be tried!

link

choppaface 1114 days ago

How does Hippo compare with TensorRT?

link