Hacker News new | ask | show | jobs
by remorses 1144 days ago
Using the CPU with quantized weights on GPT models makes sense, an example is llama.cpp, that’s because these models are constrained by memory bandwidth and not compute (low arithmetic density)

https://github.com/ggerganov/llama.cpp

1 comments

LLAMA is a LLM, very little to do with a model trying to learn MNIST. CNNs in particular gain from using a GPU (or ten), as they're optimizing weights for convolutional/spatial kernels.