|
|
|
|
|
by woadwarrior01
971 days ago
|
|
BERT style encoder-only models, like the embedding model being discussed here, don't need a KV cache for inference. A KV cache is only needed for efficient inference with encoder-decoder and decoder-only (aka GPT) models. |
|