|
|
|
|
|
by antinucleon
1114 days ago
|
|
Hippo is faster than AITemplate, and supports more generative models. We haven't compared vs TVM, but for absolute token/s on M2 Max, Hippo is able to run decoding on LLAMA with datacenter level GPUs performance (with other SW). |
|