|
|
|
|
|
by ohcmon
60 days ago
|
|
Boris, wait, wait, wait, Why not use tired cache? Obviously storage is waaay cheaper than recalculation of embeddings all the way from the very beginning of the session. No matter how to put this explanation — it still sounds strange. Hell — you can even store the cache on the client if you must. Please, tell me I’m not understanding what is going on.. otherwise you really need to hire someone to look at this!) |
|
I still don't understand it, yes it's a lot of data and presumably they're already shunting it to cpu ram instead of keeping it on precious vram, but they could go further and put it on SSD at which point it's no longer in the hotpath for their inference.