Hacker News new | ask | show | jobs
by howrar 943 days ago
Every token is already being generated with all previously generated tokens as inputs. There's nothing about the architecture that makes this hard. It just hasn't been trained on this kind of task.
1 comments

Really? I don’t know of a positional encoding scheme that’ll handle this.