| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bjt12345 633 days ago
	> [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP. I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?