| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mcharytoniuk 743 days ago
	It divides the context into smaller "slots", so it can process requests concurrently with continuous batching. See also: https://github.com/ggerganov/llama.cpp/tree/master/examples/...