Y
Hacker News
new
|
ask
|
show
|
jobs
by
helloericsf
481 days ago
X:
https://x.com/deepseek_ai/status/1893836827574030466
BF16 support Paged KV cache (block size 64) 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800
1 comments
WithinReason
481 days ago
That's 90% bandwidth efficiency and 60% compute efficiency
https://www.nvidia.com/en-us/data-center/h100/
link
helloericsf
481 days ago
They don't have h100. wink,wink.
link
rfoo
481 days ago
They have H800s which have exactly same memory bandwidth and max FLOPS.
link
pk-protect-ai
481 days ago
What about NVLink? Does it plays a role here?
link
rfoo
481 days ago
For FlashMLA? No. The code here runs on one GPU only and do not have a builtin communication part.
link
pk-protect-ai
473 days ago
But for the training it does. You need to communicate gradient changes between GPUs.
link
https://www.nvidia.com/en-us/data-center/h100/