| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ru552 791 days ago
	easiest is probably with ollama [0]. I think the ollama API is OpenAI compatible. [0]https://ollama.com/

2 comments

Most inference servers are OpenAI-compatibile. Even the "official" llama-cpp server should work fine: https://github.com/ggerganov/llama.cpp/blob/master/examples/...

Ollama runs locally. What's the best option for calling the new Mixtral model on someone else's server programmatically?