|
|
|
|
|
by bbahn
956 days ago
|
|
The context window IS longer, but it's less powerful. Obviously, they can't afford to have full transformer context over the entire context. That would be an impossibly large amount of ram. They're using some combination of sliding window/cyclical/or some other adjusted attention mechanism likely with some degree of summarization in some manner. |
|