Hacker News new | ask | show | jobs
by Bharath1234 1234 days ago
Since the Transformer decoder cannot be parallelized during inference, how can it be cost effective at scale?!