| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by woadwarrior01 971 days ago
	BERT style encoder-only models, like the embedding model being discussed here, don't need a KV cache for inference. A KV cache is only needed for efficient inference with encoder-decoder and decoder-only (aka GPT) models.