|
|
|
|
|
by aziis98
53 days ago
|
|
> I was able to get I think 20 or 30 tokens per second on CPU (DDR4 ram) alone What are you using for inference? I have a recent intel laptop with 32GB of DDR5 and I am getting at most 25tps with the llama cpp vulkan backend (that is the fastest, I also tried sycl but it is a bit slower) |
|