|
|
|
|
|
by formalsystem
608 days ago
|
|
You can estimate context length impact by doing back of the envelope calculations on KV cache size: 2 * layers * attention heads * head_dim * byte_per_element * batch_size * sequence_length Some pretty charts here https://github.com/pytorch/ao/issues/539 |
|