Hacker News new | ask | show | jobs
by zamnos 1135 days ago
It's not crazy to want to train or run models like these, it's actually quite popular right now! :) The question for you to answer is how handy with scikit-learn and pandas are you, and how much do you want to be on the bleeding edge of things? Most stuff is coming out for CUDA first, since that's what the industrial grade GPUS (A100s) use, so with Apple Arm you either have to wait for someone to port it, or port it yourself.

On the other hand, getting > 8 GiB VRAM on a laptop GPU is rare; you're definitely not getting 128 GiB VRAM, so Apple Arm, with 32 or 64 GiB or RAM (get 128 if you can afford it) is going to get you more gigabytes of usable RAM for training/inference.

1 comments

Yeah. It seems to me that it's really hard to get more than 10-14 GB of VRAM without using some sort of hyper expensive cluster. What would it cost if you wanted to do it with Nvidia? Being able to share ordinary ram with the GPU in a Mac could maybe be a unique value proposition
RTX 3090 or 4090 gets you 24Gb of VRAM, which is enough to run llama-30b (quantized to 4-bit with groupsize of 1024 or higher) at speeds comparable to ChatGPT. You can also get two and run the model split across them, although pumping data back and forth slows things down.

A brand new RTX A6000 (48Gb VRAM) is probably the largest you can get in a single card that can run in a regular PC. It can be had for $4-5k and is sufficient for llama-65b.

Beyond that, yeah, you're looking at dedicated multi-GPU server hardware.

> It seems to me that it’s really hard to get more than 10-14 GB of VRAM without using some sort of hyper expensive cluster.

Both consumer and workstation (the latter may be cheaper per RAM, but with fewer shaders) 16-24 GB GPUs (RTX 3080Ti/3090/4090/A4000/A4500/A5000), including in laptops, are not hard to find (pricey, but not “hyperexpensive clusters”), and its not until you jump above a single 48 GB RTX A6000 that you need a “cluster”.