Low-Latency Inference with Speculative Decoding on D-Matrix Corsair and GPU

Y	Hacker News new \| ask \| show \| jobs

	Low-Latency Inference with Speculative Decoding on D-Matrix Corsair and GPU (gimletlabs.ai)
	1 points by nserrino 103 days ago