Hacker News new | ask | show | jobs
by redrove 132 days ago
I have a 3090 and a 4090 and it all fits in in VRAM with Q4_0 and quantized KV, 96k ctx. 1400 pp, 80 tps.