Hacker News new | ask | show | jobs
by bearjaws 858 days ago
I believe that is where they are implying they do it without increasing memory utilization dramatically.

If 1M context uses 32x the memory of 32k, its a non-starter. Even a smallish LLM like Mixtral uses 4-8gb of memory just for your prompt. You would have 256+GiB at 1M...