Hacker News new | ask | show | jobs
by pepijndevos 2536 days ago
Sadly, I don't have 16GB GPU memory...
8 comments

I suspect NVIDIA strategically releases models that just barely don't fit into their gaming grade GPU RAM size.

Clever market segmentation.

I suspect it’s convenience that has a side effect of being strategic. Research is always interested in what is only just now possible with the newest available hardware. The fact that it encourages you to pay for the most recent hardware motivates funding the research.
> NVIDIA strategically releases

You are giving too much credit to their competence to release any smaller sized model.

GPU cluster? You might need to tweak the pytorch code, but you can disperse across multiple GPUs very easily with pytorch.
Perhaps if GPUDirect [1] were available on consumer devices, it would be possible to DMA training/model data in from an SSD/Optane/hypothetical DRAM-based PCIe drive and scale up to bigger problems as a result.

Alas, not even AMD allows access to their GPUDirect equivalent, and I can't imagine anyone being able to drum up enough pressure for those vendors to flip the toggle to make them available on their consumer lines.

[1]: https://developer.nvidia.com/gpudirect

Gpudirect is one of the last differentiators with pro version consumer cards. The 2080Ti allowed use of tensor cores on consumer cards, so I would bet they'd never enable gpudirect, since they need something different.
I know it's going to become an even bigger bottleneck moving forward, but it really raises the barrier for us hobbyists.

I own a 1080ti (12GB RAM) - and I consider this "high-end" for many people who aren't actively employed for machine learning (College kids and younger especially). I know you can "use the cloud" but I would really prefer not to...

Yea some state of the art results are just inaccessible without large budgets, simply because models can scale (and because some orgs have a lot of money to train those scaled models).

You can always just use smaller models and/or lower resolutions though; of course the results won't be on par but it may reach a qualitative result (for research and experimentation purposes) or good enough result (for personal application purposes). E.g. hobbyists don't need AlphaGo-level go playing AI (which I'm sure had aggregate costs in 5 figures or more to train), reduced versions play all far above our levels -- although in this case there's the interesting effort of pooling hobbyist resources to indeed reach SOTA, see LeelaZero[1] and LCZero.

Some kinds of research will be effective only at large orgs, that's always been true. There was indeed a brief period when people realized GPUs could unleash deep learning/CNNs that you could do anything with a good GPU, but that was very much an exception. To borrow from another field, you cannot do a level of car engine research without all infrastructure to fabricate and test engine prototypes (though you can do some/other kinds of theoretical analysis).

[1] http://zero.sjeng.org/home

There's some work on enabling larger stuff to run (slowly) via oversubscription. Maybe pytorch will get this capability eventually.

https://developer.download.nvidia.com/video/gputechconf/gtc/...

1080ti is 11GB, not 12. Titan is 12
This would be a good test for AMD ROCm - their Vega R7 has 16GB of memory, supports PyTorch, costs $700. Would the pretrained model load on their GPU?
Wow yeah looks like you’d need an RTX titan, which are $2500. Crazy expensive.
Can you do that fp16 trick and use an 8gb card?

Edit: > GPU Memory >= 10G (for fp16)