|
|
|
|
|
by antirez
418 days ago
|
|
Download the model in background. Serve the client with an LLM vendor API just for the first requests, or even using that same local LLM installed on your own servers (likely cheaper). By doing so, in the long run the inference cost is near-zero and allows to use LLMs in otherwise impossible business models (like freemium). |
|
I’ve gotten carried away - I meant to express that using cloud as a fallback for local models is something I absolutely don’t want or need, because privacy is the whole and only point to local models.