Y
Hacker News
new
|
ask
|
show
|
jobs
by
rfoo
480 days ago
For FlashMLA? No. The code here runs on one GPU only and do not have a builtin communication part.
1 comments
pk-protect-ai
472 days ago
But for the training it does. You need to communicate gradient changes between GPUs.
link