| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by WithinReason 486 days ago
	That's 90% bandwidth efficiency and 60% compute efficiency https://www.nvidia.com/en-us/data-center/h100/

1 comments

They don't have h100. wink,wink.

They have H800s which have exactly same memory bandwidth and max FLOPS.

What about NVLink? Does it plays a role here?

For FlashMLA? No. The code here runs on one GPU only and do not have a builtin communication part.

But for the training it does. You need to communicate gradient changes between GPUs.