Hacker News new | ask | show | jobs
by boltzmann-brain 113 days ago
> Our implementation is up to 2x faster than optimized speculative decoding baselines and up to 5x faster than autoregressive decoding with open source inference engines

what about per-FLOP?