LLM Inference with Ray: Expert parallelism and prefill/decode disaggregation

Y	Hacker News new \| ask \| show \| jobs

	LLM Inference with Ray: Expert parallelism and prefill/decode disaggregation (anyscale.com)
	1 points by mycelia 204 days ago