Y
Hacker News
new
|
ask
|
show
|
jobs
by
ffriend
763 days ago
JAX requires a bit more work to maintain fixed-size buffers as required by XLA, especially in case of caching and rotary embeddings. But yeah, overall the code can be pretty similar [1].
[1]:
https://github.com/dfdx/fabrique/blob/main/fabrique/llama/mo...