| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Octoth0rpe 87 days ago
	> A single patched llama-server runs on K3s, providing both generation with speculative decoding (~100 tok/s) There seems to be at least some detail on that point.