| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ebalit 776 days ago
	You need 2 H100 to have enough VRAM for the model whereas you need only 1 MI300X. Doubling the total throughput (for all completions) of 1 MI300X to simulate the numbers for a duplicated system is reasonable. They should probably show separately the throughput per completion as the tensor parallelism is often used for that purpose in addition to the doubling the VRAM.

1 comments

What's the cost to run 2x H100 and 1x MI300X?

I think that'd give us a better idea of perf/cost and whether multiplying MI300X results by 2 is justified.