|
|
|
|
|
by kkielhofner
1192 days ago
|
|
llama.cpp doesn't use the GPU at all. The genius *.cpp (whisper.cpp, llama.cpp) projects are specifically intended to optimize/democratize otherwise GPU only models to run on CPU/non-GPU (CUDA, ROCm). Technically speaking the released models are capable of running on GPU via standard framework (PyTorch, TensorFlow) support for CPU but in practice without a lot of optimization they are incredibly slow to the point of useless, hence *.cpp. You want something along these lines (warning: unnecessarily potentially offensive): https://rentry.org/llama-tard-v2 |
|