|
|
|
|
|
by IceHegel
292 days ago
|
|
There's a chance this memory problem is not going to be that easy to solve. It's true context lengths have gotten much longer, but all context is not created equal. There's like a significant loss of model sharpness as context goes over 100K. Sometimes earlier, sometimes later. Even using context windows to their maximum extent today, the models are not always especially nuanced over the long ctx. I compact after 100K tokens. |
|