|
|
|
|
|
by BaculumMeumEst
1133 days ago
|
|
Thanks, that makes sense and helps a lot. I have a 16gb m1 that I got llama 13b running on. It works really well but I really want to run bigger models, so your examples of ram -> model size are super helpful. I’ll probably just end up getting a higher capacity Mac in the next few years. Right now 96gb configurations seem to be around 4k, if that comes down a bit in the future i’ll probably pick something up. I’m not really looking to train myself so training cost isn’t an issue for me personally, I just want to be able to run the best of what the open source community comes up with (or contribute to a pool to train models, if that becomes a thing). |
|
Admittedly I'm not sure how well they work if you stream/batch to the GPU (say 96GB of system ram + 24GB GPU).
I've heard used nvidia workstation cards are reasonably cheap for >24GB VRAM.
A 3090/4090 have 24GB of vram and can run up the 30B models with some optimizations, and this is the easiest way to run the 30B models which are essentially the highest end any consumer card can run. If you also play games and have money then this is the way to go IMO.
If you were to get a GPU, it must have CUDA support (so nvidia only) unless you want a headache.