|
|
|
|
|
by ilaksh
911 days ago
|
|
Try https://github.com/ggerganov/llama.cpp Builds very quickly with make. But if it's slow when you try it then make sure to enable any flags related to CUDA and then try the build again. A key parameter is the one that tells it how many layers to offload to the GPU. ngl I think. Also, download the 4 bit GGUF from HuggingFace and try that. Uses much less memory. |
|