Cutting LLM Batch Inference Time by Half with Dynamic Prefix Bucketing

Y	Hacker News new \| ask \| show \| jobs

	Cutting LLM Batch Inference Time by Half with Dynamic Prefix Bucketing (daft.ai)
	2 points by DISCURSIVE 216 days ago