|
|
|
|
|
by rubiquity
56 days ago
|
|
Have you tried running llama.cpp with Unified Memory Access[1] so your iGPU can seamlessly grab some of the RAM? The environment variable is prefixed with CUDA but this is not CUDA specific. It made a pretty significant difference (> 40% tg/s) on my Ryzen 7840U laptop. 1 - https://github.com/ggml-org/llama.cpp/blob/master/docs/build... |
|