|
|
|
|
|
by loudmax
267 days ago
|
|
There was an interesting post to r/LocalLLaMA yesterday from someone running inference mostly on CPU: https://carteakey.dev/optimizing%20gpt-oss-120b-local%20infe... One of the observations is how much difference memory speed and bandwidth makes, even for CPU inference. Obviously a CPU isn't going to match a GPU for inference speed, but it's an affordable way to run much larger models than you can fit in 24GB or even 48GB of VRAM. If you do run inference on a CPU, you might benefit from some of the same memory optimizations made by gamers: favoring low-latency overclocked RAM. |
|