| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vladgur 61 days ago
	Curious which models are you able to run and how many 3090s do they require at scale?

1 comments

mips_avatar 61 days ago

4 3090s with nvlinks on each pair. Super fast inference on Moe models around 20-36b

link

embedding-shape 60 days ago

> Super fast inference

How fast is "super fast" exactly, and with what runtime+model+quant specifically? Curious to see how how 4x 3090s compare to 1x Pro 6000, could probably put together 4x 3090s for a fraction of the cost compared to the Pro 6000, but the times I've seen the tok/s in/out for multiple GPUs my heart always drops a little.

link

mips_avatar 60 days ago

I haven't benchmarked against a pro 6000, it's more that i have 4 3090s and i don't have a pro 6000.

link

embedding-shape 60 days ago

Yes, that's why I'm asking you what exactly 4 3090s get in prompt-processing and generation, sorry if I was unclear.

link

mips_avatar 60 days ago

Maxes out around 4K tok/s output. Each pair of 3090s has its own instance of the model with parallelism across the nvlink bridge. Though nvlink is only 2x over pcie5

link