Hacker News new | ask | show | jobs
by foxhop 633 days ago
4090 has 24G

So we really need ~40B or G model (two cards) or like a ~20B with some room for context window.

5090 has ??G - still unreleased

1 comments

Qwen2.5 has a 32B release, and quantised at q5_k_m it *just about" completely fills a 4090.

It's a good model, too.

Do you also need space for context on the card to get decent speed though?
Depends how much you need. Dropping to q4_k_m gives you 3GB back if that makes the difference.