Hacker News new | ask | show | jobs
by p1esk 785 days ago
I updated my comment above: I’m using HF transformers repo, which gets models from HF hub.
1 comments

Do you have an NVIDIA GPU? I have not had much luck with the transformers library on a Mac.
Of course. I thought Nvidia GPUs are pretty much a must have to play with DL models.
Well being able to run these models on CPU was pretty much the revolutionary part of llama.cpp.
I can run them on CPU - HF uses plain Pytorch code - fully supported on CPU.
But it's likely to be much slower than what you'd get with a backend like llama.cpp on CPU (particularly if you're running on a Mac, but I think on Linux as well), as well as not supporting features like CPU offloading.
Are there benchmarks? 2x speed up would not be enough for me to return to c++ hell, but 5x might be, in some circumstances.
Ollama supports many radeons now. And I guess llama.cpp does too, after all it's what ollama uses as backend.
PyTorch (the underlying framework of HF) supports AMD as well, though I haven’t tried it.