|
|
|
|
|
by yazaddaruvala
317 days ago
|
|
At least in theory. If the model is the same, the embeddings can be reused by the model rather than recomputing them. I believe this is what they mean. In practice, how fast will the model change (including tokenizer)? how fast will the vector db be fully backfilled to match the model version? That would be the “cache hit rate” of sorts and how much it helps likely depends on some of those variables for your specific corpus and query volumes. |
|
I can't find any evidence that this is possible with Gemini or any other LLM provider.