Auto-regressive LLMs do this as I understand it, though it can vary if they feed the combined input and output[1] through the whole net like GPT-2 and friends, or just the decoder[2]. I described the former, and I should have clarified that.
In either case you can "prime it" like it was suggested.
A regular RNN has more feedback[3], like each layer feeding back to itself, as I understand it.
In either case you can "prime it" like it was suggested.
A regular RNN has more feedback[3], like each layer feeding back to itself, as I understand it.
Happy to be corrected though.
[1]: https://jalammar.github.io/illustrated-gpt2/#one-difference-...
[2]: https://medium.com/@ikim1994914/understanding-the-modern-llm...
[3]: https://karpathy.github.io/2015/05/21/rnn-effectiveness/