| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by VikingCoder 849 days ago
	I'm curious - where are the GPUs with decent processing power but enormous memory? Seems like there'd be a big market for them.

8 comments

wongarsu 849 days ago

Nvidia is making way too much money keeping cards with lots of memory exclusive to server GPUs they sell with insanely high margins.

AMD still suffers from limited resources and doesn't seem willing to spend too much chasing a market that might just be a temporary hype, Google's TPUs are a pain to use and seem to have stalled out, and Intel lacks commitment, and even their products that went roughly in that direction aren't a great match for neural networks because of their philosophy of having fewer more complex cores.

ls612 849 days ago

MacBooks with M2 or M3 Max. I’m serious. They perform like a 2070 or 2080 but have up to 128GB of unified memory, most of which can be used as VRAM.

ttul 848 days ago

MPS is promising and the memory bandwidth is definitely there, but stable diffusion performance on Apple Silicon remains terribly poor compared with consumer Nvidia cards (in my humble opinion). Perhaps this is partly because so many bits of the SD ecosystem are tied to Nvidia primitives.

ummonk 848 days ago

Image diffusion models tend to have relatively low memory requirements compared to LLMs (and don’t benefit from batching), so having access to 128 GB of unified memory is kinda pointless.

Filligree 848 days ago

They do benefit from batching; up to a 50% performance improvement, in my experience.

That might seem small compared to LLMs, but it isn't small in absolute terms.

ls612 848 days ago

I got a 2x jump on my 4090 from batching SDXL.

ls612 848 days ago

Stable diffusion will run fine on a 3090, or 4070ti Super and higher.

declaredapple 848 days ago

How many tokens/s are we talking for a 70B model?

Last I saw they performed really poorly, like lower single digits t/s. Don't get me wrong they're probably a decent value for experimenting with it, but is flat out pathetic compared to an A100 or H100. And I think useless for training?

smcleod 848 days ago

You can run a 180B model like Falcon Q4 around 4-5tk/s, a 120B model like Goliath Q4 at around 6-10tk/s, and 70B Q4 around 8-12tk/s and smaller models much quicker, but it really depends on the context size, model architecture and other settings. A A100 or H100 is obviously going to be a lot faster but it costs significantly more taking its supporting requirements into account and can’t be run on a light, battery powered laptop etc…

int_19h 848 days ago

For text inference, what you want is M1/M2 Ultra with its 800 Gb/s RAM. Max only goes up to 400 Gb/s.

ls612 848 days ago

Yeah but the ultra only goes in desktop platforms which may be limiting to some.

int_19h 848 days ago

But that's no different from mid-to-high-end GPUs, which is what the original ask was about.

SV_BubbleTime 849 days ago

I’ll bet you the Nvidua 50xx series will have cards that are asymmetric for this reason. But nothing that will cannibalize their gaming market.

You’ll be able to get higher resolution but slowly. Or pay the $2800 for a 5090 and get high res with good speed.

m463 846 days ago

I kind of wonder if gaming will start incorporating AI stuff. What if instead of generating a stable diffusion image, you could generate levels and monsters

weebull 848 days ago

I think the AMD 8600XT is a mod in this direction, otherwise there was little point in releasing it.

GPUs need a decent virtual memory system though. The current "it runs or it crashes" situation isn't good enough.

pbhjpbhj 848 days ago

Nvidia have a system for DMA from GPU to system memory, GPUdirect. That seems like a potentially better route if latency can be handled well.

nick238 848 days ago

GPU memory is all about bandwidth, not latency. DDR5 can do 4-8 GT/s x 64-bit bus per DIMM, so maxing 128 GB/s with a dual memory controller, 512 GB/s with 8x memory controllers on server chips, but GDDR6 can run at twice the frequency and has a memory bus ~5x as wide in the 4090, so you get an order of magnitude bump in throughput, so nearly 1 TB/s on a consumer product. Datacenter GPUs (e.g. A100) with HBM2e doubles that to 2 TB/s

iosjunkie 849 days ago

I dream of AMD or Intel creating cards to do just that

3abiton 848 days ago

Tesla P40

p1esk 849 days ago

H200 has 141GB, B100 (out next month) will probably have even more. How much memory do you need?

holoduke 849 days ago

We need 128gb with a 4070 chip for about 2000 dollars. Thats what we want.

duffyjp 848 days ago

I've never tried it, but in Windows you can have CUDA apps fall back to system ram when GPU vram is exhausted. You could slap 128gb in your rig with a 4070. I'm sure performance falls off a cliff, but if it's the difference between possible and impossible that might be acceptable.

https://nvidia.custhelp.com/app/answers/detail/a_id/5490/~/s...

ttul 848 days ago

Nvidia will not build that any time soon. RAM is the dividing line between charging $40,000 vs $2500…

qwertox 848 days ago

Please give me some DIMM slots on the GPU so that I can choose my own memory like I'm used to from the CPU-world and which I can re-use when I upgrade my GPU.

int_19h 848 days ago

An M1 Mac Studio with that much RAM can be had for around $3K if you look for good deals, and will give you ~8 tok/s on a 70B model, or ~5 tok/s for a 120B one.

ta_1138 848 days ago

Unfortunately production capacity for that is limited, and with sufficient demand, all pricing is an auction. Therefore, we aren't going to be seeing that card in years

FeepingCreature 848 days ago

Yes please.