|
|
|
|
|
by polishgladiator
1045 days ago
|
|
> If anyone has specific technical questions I'd be happy to answer as best I can. What is the context size for these measurements? Is it the full 4k for llama-2? And just to be clear, when you say memory footprint, this is the entire memory foorprint right? Weights, 4k KV cache etc? And more generally, I'm curious about the use case for running puny models like Llama-2 7B in the cloud on desktops GPUs (like 4090) with batch==32? |
|