|
|
|
|
|
by bearjaws
858 days ago
|
|
I believe that is where they are implying they do it without increasing memory utilization dramatically. If 1M context uses 32x the memory of 32k, its a non-starter. Even a smallish LLM like Mixtral uses 4-8gb of memory just for your prompt. You would have 256+GiB at 1M... |
|