Layer-wise inferencing and batching: Small VRAM doesn't limit LLM throughput

Y	Hacker News new \| ask \| show \| jobs

	Layer-wise inferencing and batching: Small VRAM doesn't limit LLM throughput (verdagon.dev)
	2 points by verdagon 768 days ago