|
|
|
|
|
by gwenzek
632 days ago
|
|
It' s a bit early to compare directly to TensorRT because we don't have a full-blown equivalent. Note that our focus is being platform agnostic, easy to deploy/integrate, good performance all-around, and ease of tweaking.
We are using the same compiler than Jax, so our performances are on par.
But generally we believe we can gain on overall "tok/s/$" by having shorter startup time, choosing the most efficient hardware available, and easily implementing new tricks like multi-token prediction. |
|