|
|
|
|
|
by tarruda
1144 days ago
|
|
3 tokens/sec is a lot faster than what I experienced. Even though your CPU has a lot more cores, I think llama.cpp was not being able to make good use of more than 8 threads. When did you test this? Maybe llama.cpp had some improvements since I used it (which was at the start of the project). |
|