| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aussieguy1234 781 days ago
	I'm looking at buying 2 X RTX 3060s to run LLama 70b for my new PC I just purchased. Will this work, or do I need a Tesla P40 or two?

3 comments

tarruda 780 days ago

Note that 2 RTX 3060 will probably be significantly slower than RTX4090.

Even with RTX 4090, 2 tokens per second is very slow and likely not ideal for most tasks. It is impressive (much faster than previous solutions), but still very slow for real time use.

If you want to run Llama 3 70b, might be better to purchase a mac studio with 64gb RAM (more for longer contexts) and run with 4-bit quantization.

My 2 cents: For most common tasks Llama 3 8b will be more than enough, and you can run that with full precision using a single rtx 3090. At a much lower cost, you can also run Llama 3 8b with 8-bit quantization in a single RTX 3060, if it has 12GB RAM.

link

dannyw 781 days ago

Theoretically there's no reason why this shouldn't work, but you likely will find the software isn't designed for multi-GPU and have to reimplement/fix things yourself.

You will also be getting about 720GB/s of memory bandwidth with 2x3060; instead of 1TB/s with the 4090; so expect lower performance.

link

34679 780 days ago

I picked up a couple RTX 4060ti in the 16GB version for $450 each a couple days ago from Bestbuy. Had been looking at the 3060 like yourself. Installed LM Studio and have been trying out a bunch of models with varying levels of quantization, completely pain free.

link