|
|
|
|
|
by keriati1
858 days ago
|
|
I think it is even easier right now for companies to self host an inference server with basic rag support: - get a Mac Mini or Mac Studio
- just run ollama serve,
- run ollama web-ui in docker
- add some coding assitant model from ollamahub with the web-ui
- upload your documents in the web-ui No code needed, you have your self hosted LLM with basic RAG giving you answers with your documents in context.
For us the deepseek coder 33b model is fast enough on a Mac Studio with 64gb ram and can give pretty good suggestions based on our internal coding documentation. |
|