Hacker News new | ask | show | jobs
by rfoo 480 days ago
For FlashMLA? No. The code here runs on one GPU only and do not have a builtin communication part.
1 comments

But for the training it does. You need to communicate gradient changes between GPUs.