| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by formalsystem 608 days ago
	You can estimate context length impact by doing back of the envelope calculations on KV cache size: 2 * layers * attention heads * head_dim * byte_per_element * batch_size * sequence_length Some pretty charts here https://github.com/pytorch/ao/issues/539