Hacker News new | ask | show | jobs
by ru552 791 days ago
easiest is probably with ollama [0]. I think the ollama API is OpenAI compatible.

[0]https://ollama.com/

2 comments

Most inference servers are OpenAI-compatibile. Even the "official" llama-cpp server should work fine: https://github.com/ggerganov/llama.cpp/blob/master/examples/...
Ollama runs locally. What's the best option for calling the new Mixtral model on someone else's server programmatically?