Hacker News new | ask | show | jobs
by vardalab 101 days ago
Q4 quants on 32G VRAM gives you 131K context for 35BA3B and 27B models who are pretty capable. On 5090 one gets 175 tg and ~7K pp with 35BA3B, 27B isaround 90 tg. So speed is awesome. Even Strix 395 gives 40 tk/s and 256K context. Pretty amazing, there is a reason people are excited about qwen 3.5