| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by WhitneyLand 767 days ago

I don’t see where similar wins have ever been achieved.

Speculative decoding can reduce latency, but at the cost of using a lot more compute. The amazing thing here is latency and global throughput improvements would be realized because of the increase in efficiency.

From what I understand speculative decoding can also come with more challenges insofar as trying to maintain overall output quality.