| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rig666 1039 days ago
	Just a suggestion but they have 4bit quantified models that are even smaller and faster that the 8 bit. Your average 13B 4bit model is only about 8-9gb of VRAM. Using this I bet you can get a much higher perimeter model on the 3090.

1 comments

neilv 1039 days ago

I was using various 4-bit quantized earlier, but decided to go back to 8-bit for 13B, since I had the VRAM anyway, and (at the time, for other reasons) was seeing some quirky behavior.

70B is currently 4-bit on this box, and once I have GPU accel for 70B, I'll see how the quality compares to 13B 8-bit.

link