| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by foxhop 633 days ago

4090 has 24G

So we really need ~40B or G model (two cards) or like a ~20B with some room for context window.

5090 has ??G - still unreleased

1 comments

Qwen2.5 has a 32B release, and quantised at q5_k_m it *just about" completely fills a 4090.

It's a good model, too.

Do you also need space for context on the card to get decent speed though?

Depends how much you need. Dropping to q4_k_m gives you 3GB back if that makes the difference.