Hacker News new | ask | show | jobs
by formalsystem 608 days ago
You can estimate context length impact by doing back of the envelope calculations on KV cache size: 2 * layers * attention heads * head_dim * byte_per_element * batch_size * sequence_length

Some pretty charts here https://github.com/pytorch/ao/issues/539