Hacker News new | ask | show | jobs
by kookamamie 1144 days ago
> I would guess the constant memory allocation and frees in the training loop are the bottleneck

No, the bottleneck would be not utilizing the idling GPU.

3 comments

Using the CPU with quantized weights on GPT models makes sense, an example is llama.cpp, that’s because these models are constrained by memory bandwidth and not compute (low arithmetic density)

https://github.com/ggerganov/llama.cpp

LLAMA is a LLM, very little to do with a model trying to learn MNIST. CNNs in particular gain from using a GPU (or ten), as they're optimizing weights for convolutional/spatial kernels.
Pls see the post from version_five nearby: https://news.ycombinator.com/item?id=35699260

> simple, purpose written NNs for many simple applications [... as opposed to] python and cuda libraries

You can go 100x faster using SIMD on the CPU, instead of doing the linear algebra by hand, then another order of magnitude or two on the GPU.