|
|
|
|
|
by jmmcd
1469 days ago
|
|
As I said, I think it's interesting if it uses the downtime for something else. (Like a chess player thinking on the opponent's click.) Talking to itself, speculating about continuations, etc - but not actually outputting them (else it's just a longer response). Instead, storing some parts in the buffer. Perhaps bifurcating. Even better, using some down time to summarise recent material and store the summary in the buffer. That's not a bad description of "thinking". |
|
The whole motivating aspect of these models is "attention is all you need" to reduce dependencies from recurrence, which would otherwise halt this obscene scaling in its tracks.