|
|
|
|
|
by horsawlarway
1071 days ago
|
|
I started here https://github.com/ggerganov/llama.cpp Which won't run everything, but will run model in the GGML format such as https://huggingface.co/TheBloke/llama-65B-GGML The steps are basically: 1. Download a model 2. Make sure you have the latest nvidia driver for your machine, along with the cuda toolkit. This will vary by OS but is fairly easy on most linux distros. 3. compile https://github.com/ggerganov/llama.cpp following their instructions (in particular, look for LLAMA_CUBLAS for enabling GPU support) 4. Run the model following their instructions. There are several flags that are important, but you can also just use their server example that was added a few days ago - it gives a fairly solid chat interface. |
|