|
|
|
|
|
by golol
1188 days ago
|
|
Well all the information it learns at runtime is encoded in the context window.
I don't feel like {tokens}^ctxWindow is unmeasurably complex. I think one should see a transformer as a stochastic computer operating on its memory. If you modelled a computer as a stochastic process, would you taje the state space to consist of the most recent instruction, or instead the whole memory of the computer? |
|