|
|
|
|
|
by sshumaker
729 days ago
|
|
It’s how LLMs work - they are effectively recursive at inference time, after each token is sampled, you feed it back in. You will end up with the same model state (not including noise) as if that had been the original input prompt. |
|
And no one here touched how high a multiple the cost is, so I assume its pretty high.