|
|
|
|
|
by WhitneyLand
767 days ago
|
|
I don’t see where similar wins have ever been achieved. Speculative decoding can reduce latency, but at the cost of using a lot more compute. The amazing thing here is latency and global throughput improvements would be realized because of the increase in efficiency. From what I understand speculative decoding can also come with more challenges insofar as trying to maintain overall output quality. |
|