Hacker News new | ask | show | jobs
by AnthonyMouse 1153 days ago
Most of these implementations are not platform-specific. I've been running llama.cpp on x86_64 hardware and the performance is fine. The small models are fast and the quantized 65B model generates about a token per second on a system with dual-channel DDR4, which isn't unusable.

The tough thing to find is something affordable that will run the unquantized 65B model at an acceptable speed. You can put 128GB of RAM in affordable hardware but ordinary desktops aren't fast. The things that are fast are expensive (e.g. I bet Epyc 9000 series would do great). And that's the thing Apple doesn't get you either, because Apple Silicon isn't available with that much RAM, and if it was it wouldn't be affordable (the 96GB Macbook Pro, which isn't enough to run the full model, is >$4000).

1 comments

If you want to spend $4800.00 on just the computer, you can get a Mac Studio with 128G of memory with 400GB/s bandwidth. There are sparse reports out there of folks running 65B models on it. I've seen no performance measurements though.
It's interesting that they actually have it but the price is still silly.

  SP5 system board ~$1000
  Epyc 9124 $1083
  192GB registered DDR5 (12x16GB) ~$1000
  case, power supply, modest storage: ~$300
460GB/s bandwidth from 12 memory channels, 50% more memory and you'd have more than $1000 left over. But >$3000 is not a low price either, it's just lower.