Hacker News new | ask | show | jobs
by regularfry 638 days ago
Qwen2.5 has a 32B release, and quantised at q5_k_m it *just about" completely fills a 4090.

It's a good model, too.

1 comments

Do you also need space for context on the card to get decent speed though?
Depends how much you need. Dropping to q4_k_m gives you 3GB back if that makes the difference.