|
|
|
|
|
by ow5
419 days ago
|
|
Hi! one of the contributors to the paper — we have kernels not released yet that can shave down decoding latency by >20%. Also when we ran experiments for streaming with the current kernels, we were median ~1.3x slower at inference |
|