Y
Hacker News
new
|
ask
|
show
|
jobs
by
yshui
534 days ago
Any autoregressive model can do what you are describing. transformers are generating one token at a time too, not all at once.
2 comments
intalentive
533 days ago
True but memory requirements grow with sequence length. For recurrent models the memory requirement is constant. This is why I qualified with "low memory".
link
whimsicalism
533 days ago
yes but transformers are much slower than state space models
link