Hacker News new | ask | show | jobs
by ebalit 729 days ago
You need 2 H100 to have enough VRAM for the model whereas you need only 1 MI300X. Doubling the total throughput (for all completions) of 1 MI300X to simulate the numbers for a duplicated system is reasonable.

They should probably show separately the throughput per completion as the tensor parallelism is often used for that purpose in addition to the doubling the VRAM.

1 comments

What's the cost to run 2x H100 and 1x MI300X?

I think that'd give us a better idea of perf/cost and whether multiplying MI300X results by 2 is justified.