Layer-wise inferencing and batching: Small VRAM doesn't limit LLM throughput

Y	Hacker News new \| ask \| show \| jobs

	Layer-wise inferencing and batching: Small VRAM doesn't limit LLM throughput (verdagon.dev)
	5 points by one-punch 771 days ago