Hacker News new | ask | show | jobs
by chessgecko 814 days ago
Going above 24GB is probably not going to be cheap until gddr7 is out, and even that will only push it to 36gb. The fancier stacked gddr6 stuff is probably pretty expensive and you can’t just add more dies because of signal integrity issues.
1 comments

Assuming you want to maintain full bandwidth.

Which I don't care too much about.

However, even 16->24GB is a big step, since a lot of the model are developed for 3090/4090-class hardware. 36GB would place it lose to the class of the fancy 40GB data center cards.

If Intel decided to push VRAM, it will definitely have a market. Critically, a lot of folks will also be incentivized to make software compatible, since it will be the cheapest way to run models.

At this point, I cannot run an entire class of models without OOM. I will take a performance hit if it lets me run it at all.

I want a consumer card that can do some number of tokens per second. I do not need a monster that can serve as the basis for a startup.

A maxed out Mac Studio probably fits your requirements as stated.
If I were willing to drop $4k on that setup, I might as well get the real NVidia offering.

The hobbyist market needs something priced well under $1k to make it accessible.

How comes you don't care about full bandwidth?
Mostly because I use this for development.

If a model takes twice as long to run.... I'll live. Worst-case, it will be mildly annoying.

If I can't run a model, that's a critical failure.

There's a huge step up CPU->GPU which I need, but 3060 versus 4090 isn't a big deal at all. Indeed, the 24GB versus 16GB is a bigger difference than the number of CUDA cores.

The thing about RAM speed (aka bandwidth) is that it becomes irrelevant if you run out and have to page out to slower tiers of storage.