| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by helloericsf 481 days ago
	X:https://x.com/deepseek_ai/status/1893836827574030466 BF16 support Paged KV cache (block size 64) 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800

1 comments

That's 90% bandwidth efficiency and 60% compute efficiency

They don't have h100. wink,wink.

They have H800s which have exactly same memory bandwidth and max FLOPS.

What about NVLink? Does it plays a role here?

For FlashMLA? No. The code here runs on one GPU only and do not have a builtin communication part.

But for the training it does. You need to communicate gradient changes between GPUs.