Y
Hacker News
new
|
ask
|
show
|
jobs
by
asne11
743 days ago
"slot" is a processing unit. Either GPU or CPU. I believe `llama.c` is only CPU so I'm guessing 1 slot = 1 core (or thread)?
3 comments
mcharytoniuk
743 days ago
It divides the context into smaller "slots", so it can process requests concurrently with continuous batching. See also:
https://github.com/ggerganov/llama.cpp/tree/master/examples/...
link
kgeist
743 days ago
Llama.cpp can run on CPU, on GPU, or in mixed mode (some layers run on CPU and some on GPU if you don't have enough VRAM).
link
new299
743 days ago
llama.cpp is not CPU only…
link