Size of KV cache = 2 * (num_layers) * (num_kv_heads * dim_head) * seq_length * precision 8-bit Gemma 27B KV cache = 2 * (46) * (16 * 144) * 1e6 * 1 byte ≈ 200 GB
Formula: https://developer.nvidia.com/blog/mastering-llm-techniques-i...
Gemma 27B config: https://huggingface.co/google/gemma-2-27b/blob/main/config.j...