| HN Mirror

Auto-regressive LLMs do this as I understand it, though it can vary if they feed the combined input and output[1] through the whole net like GPT-2 and friends, or just the decoder[2]. I described the former, and I should have clarified that.

In either case you can "prime it" like it was suggested.

A regular RNN has more feedback[3], like each layer feeding back to itself, as I understand it.