|
|
|
|
|
by ebalit
729 days ago
|
|
You need 2 H100 to have enough VRAM for the model whereas you need only 1 MI300X. Doubling the total throughput (for all completions) of 1 MI300X to simulate the numbers for a duplicated system is reasonable. They should probably show separately the throughput per completion as the tensor parallelism is often used for that purpose in addition to the doubling the VRAM. |
|
I think that'd give us a better idea of perf/cost and whether multiplying MI300X results by 2 is justified.