Y
Hacker News
new
|
ask
|
show
|
jobs
by
Remmy
1108 days ago
I've been using llama.cpp with the python wrappers and it's the speed increase has been great, but it seemed to be limited to a max of 40 N_GPU_LAYERS. Going to have to update and see what sort of improvement I see.