| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by remorses 1191 days ago
	Using the CPU with quantized weights on GPT models makes sense, an example is llama.cpp, that’s because these models are constrained by memory bandwidth and not compute (low arithmetic density) https://github.com/ggerganov/llama.cpp

1 comments

kookamamie 1190 days ago

LLAMA is a LLM, very little to do with a model trying to learn MNIST. CNNs in particular gain from using a GPU (or ten), as they're optimizing weights for convolutional/spatial kernels.

link