Hacker News new | ask | show | jobs
by bjt12345 633 days ago
> [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP.

I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?