|
|
|
|
|
by car
122 days ago
|
|
Building Llama.cpp from source with CUDA enabled should get you pretty far. llama-server has a really good web UI, the latest version supports model switching. As for models, plenty of GGUF quantized (down to 2-bit) available on HF and modelscope. |
|