Hacker News new | ask | show | jobs
by car 473 days ago
Could this work with llama.cpp, since it’s the engine behind Ollama?

I usually build llama.cpp from source and download quantized (GGUF) models from Huggingface, haven’t used Ollama this far.

1 comments

No, for now, I’ve only made it work with Ollama, but it could be ideal to do it directly on llama.cpp. Thank you, I’ll take note of it.
That would be great. Llama.cpp’s built in server offers HTTP embedding endpoints.