Hacker News new | ask | show | jobs
by rig666 1039 days ago
Just a suggestion but they have 4bit quantified models that are even smaller and faster that the 8 bit. Your average 13B 4bit model is only about 8-9gb of VRAM. Using this I bet you can get a much higher perimeter model on the 3090.
1 comments

I was using various 4-bit quantized earlier, but decided to go back to 8-bit for 13B, since I had the VRAM anyway, and (at the time, for other reasons) was seeing some quirky behavior.

70B is currently 4-bit on this box, and once I have GPU accel for 70B, I'll see how the quality compares to 13B 8-bit.