|
|
|
|
|
by alexellisuk
132 days ago
|
|
Is this going to need 1x or 2x of those RTX PRO 6000s to allow for a decent KV for an active context length of 64-100k? It's one thing running the model without any context, but coding agents build it up close to the max and that slows down generation massively in my experience. |
|