Hacker News new | ask | show | jobs
by omneity 990 days ago
Does an LLM need to loop back to re-read its input, even in a regular (read non-sliding) context window?

Maybe I'm misunderstanding, but doesn't the hidden state solve the "lookup" problem in this case? In the sense that the LLM needs to ingest your entire input anyway before answering, then whether your instruction is at the front or at the end carries little impact besides on attention.

1 comments

It's my understanding that in regular non-sliding window context models the llm is able to pay attention to any part of the input when generating the output. The attention head is essentially able to jump back and forward to any point in its context window. This is what differentiates the attention mechanism from other models that use token proximity as a proxy for relevance.