Hacker News new | ask | show | jobs
by ffriend 763 days ago
JAX requires a bit more work to maintain fixed-size buffers as required by XLA, especially in case of caching and rotary embeddings. But yeah, overall the code can be pretty similar [1].

[1]: https://github.com/dfdx/fabrique/blob/main/fabrique/llama/mo...