It can. Check out "zero shot learning" -> both sentences would be part of a single "evaluation", and the first sentence would prime for the output of the second. (You basically combine multiple "evaluations" into one, and context is held in tensors / blobs)
Sure, but I feel like we're talking about different things. I consider "context held in tensors" as part of the model. That is, if you zero out these registers, then the model evolves in a deterministic way every time. In this case, when you perform a query, I assume those tensors are always initialized before your query.