Y
Hacker News
new
|
ask
|
show
|
jobs
by
mcharytoniuk
743 days ago
It divides the context into smaller "slots", so it can process requests concurrently with continuous batching. See also:
https://github.com/ggerganov/llama.cpp/tree/master/examples/...