| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jszymborski 506 days ago
	The trouble now is finding an RTX 4090.

2 comments

hnuser123456 506 days ago

RTX 3090s are easy to find and work just as well.

link

petercooper 505 days ago

Running the Q4 quant (14GB or so in size) at 46 tok/sec on a 3090 Ti right now if anyone's curious to performance. Want the headroom to try and max out the context.

link

earleybird 505 days ago

Interesting - _q4 on a pair of 12Gb 3060s it runs at 20 tok/sec. _q8 (25Gb) on same is about 4 tok/sec.

link

petercooper 505 days ago

~360GB/s memory bandwidth on the 3060, versus ~1008GB/s on the 3090 Ti probably accounts for that.

Given that, I'd expect a single 3060 (if a large enough one existed) to run at about 16 tok/s so 20 tok/s on two isn't bad not being NVLinked.

link

benkaiser 505 days ago

Runs on an AMD 7900 XTX at about ~20 tokens per second using LM Studio + Vulkan.

link