Hacker News new | ask | show | jobs
by WithinReason 486 days ago
That's 90% bandwidth efficiency and 60% compute efficiency

https://www.nvidia.com/en-us/data-center/h100/

1 comments

They don't have h100. wink,wink.
They have H800s which have exactly same memory bandwidth and max FLOPS.
What about NVLink? Does it plays a role here?
For FlashMLA? No. The code here runs on one GPU only and do not have a builtin communication part.
But for the training it does. You need to communicate gradient changes between GPUs.