|
|
|
|
|
by pbbakkum
1021 days ago
|
|
I've experimented with it, the reason I haven't yet added it is that I want deployment to be seamless, and it's not trivial to ship a binary that would (without extra fuss or configuration) efficiently support Metal and CUDA, plus download the models in a graceful way. This is of course possible, but still hard, and not clear if it's the right place to spend energy. I'm curious how you think about it - is your primary desire to work offline or avoid sending data to OpenAI? Or both? |
|
FWIW, from my understanding llama.cpp is pretty easy to integrate and is reasonably fast for being API agnostic. Ollama embeds it, for example. No pressure, just pointing it out :)