| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eschnou 714 days ago

For pure self-hosting, I'd look into Ollama (ollama.com) or llamafile (https://github.com/Mozilla-Ocho/llamafile) on the LLM side and then picking a UI such as https://openwebui.com/ or one of the many in this list: https://github.com/ollama/ollama?tab=readme-ov-file#web--des...

However, the issue you will quickly encounter is resources/costs. For a simple mode like llama3-7b you need at least a g5.2xlarge on AWS. If you want a 'chat gpt equivalent' model, you need something like llama3-70b or command-r-plus etc. These will require at least a g5.48xlarge that will cost you $20 an hour

An alternate approach is going hybrid: self-hosted UI which takes care of user access, shared documents (RAG), custom prompts etc but that you hook to a LLM provider where you pay per token (could be OpenAI platform or anything from Huggings).

Let me know if this helps! Also note that I'm lead dev of an open source project addressing these kind of needs: https://opengpa.org - Feel free to jump on our Discord to discuss.

1 comments

clark-kent 713 days ago

If you are already on AWS, then it's better to run llama3 using AWS Bedrook. With Bedrock, you only pay for what you use instead of paying for an always-on EC2 instance.

link