| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Tostino 1030 days ago
	1x 3090 IMO is about the minimum you'd want to waste time with. It can serve a 13b + 7b model at once if you want, you can qlora train a 13b with a ton of context length, and it's fast enough to iterate with for training. I have 2x 3090 in my machine, and I can do inference of ~40tokens/sec on a 13b llama2 model on one card. I can split the 70b parameter model between the two cards and get ~12-15tokens/sec. I can't train the 70b parameter model with my 2x 3090 though sadly, not quite enough vram.