|
|
|
|
|
by macrolime
807 days ago
|
|
"This is more computationally efficient than performing a full content-based lookup across an entire memory buffer for each step in the future, and could be one step towards drastically increasing the context-length available for making a prediction." Is this how they get a context window of 10 million tokens? Or are they refering to even longer context windows in the future? |
|