| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kookamamie 1192 days ago
	> I would guess the constant memory allocation and frees in the training loop are the bottleneck No, the bottleneck would be not utilizing the idling GPU.

3 comments

remorses 1192 days ago

Using the CPU with quantized weights on GPT models makes sense, an example is llama.cpp, that’s because these models are constrained by memory bandwidth and not compute (low arithmetic density)

https://github.com/ggerganov/llama.cpp

link

kookamamie 1191 days ago

LLAMA is a LLM, very little to do with a model trying to learn MNIST. CNNs in particular gain from using a GPU (or ten), as they're optimizing weights for convolutional/spatial kernels.

link

mdp2021 1191 days ago

Pls see the post from version_five nearby: https://news.ycombinator.com/item?id=35699260

> simple, purpose written NNs for many simple applications [... as opposed to] python and cuda libraries

link

mjdowney 1192 days ago

You can go 100x faster using SIMD on the CPU, instead of doing the linear algebra by hand, then another order of magnitude or two on the GPU.

link