Hacker News new | ask | show | jobs
by maxwells-daemon 1771 days ago
Autoregressive transformers take a while to generate text, since you need to run the whole model once for every word in the output.