Hacker News new | ask | show | jobs
by bbahn 956 days ago
The context window IS longer, but it's less powerful. Obviously, they can't afford to have full transformer context over the entire context. That would be an impossibly large amount of ram. They're using some combination of sliding window/cyclical/or some other adjusted attention mechanism likely with some degree of summarization in some manner.