Hacker News new | ask | show | jobs
by simonw 785 days ago
Do you have an NVIDIA GPU? I have not had much luck with the transformers library on a Mac.
1 comments

Of course. I thought Nvidia GPUs are pretty much a must have to play with DL models.
Well being able to run these models on CPU was pretty much the revolutionary part of llama.cpp.
I can run them on CPU - HF uses plain Pytorch code - fully supported on CPU.
But it's likely to be much slower than what you'd get with a backend like llama.cpp on CPU (particularly if you're running on a Mac, but I think on Linux as well), as well as not supporting features like CPU offloading.
Are there benchmarks? 2x speed up would not be enough for me to return to c++ hell, but 5x might be, in some circumstances.
I think the biggest selling point of ollama (llama.cpp) are quantizations, for a slight hit (with q8 or q4) in quality you can get a significant performance boost.
There's a Python binding for llama.cpp which is actively maintained and has worked well for me: https://github.com/abetlen/llama-cpp-python
Ollama supports many radeons now. And I guess llama.cpp does too, after all it's what ollama uses as backend.
PyTorch (the underlying framework of HF) supports AMD as well, though I haven’t tried it.