Hacker News new | ask | show | jobs
by jianfgo 849 days ago
Anyone has a tutorial how to achieve it to own a self-hosted model?
8 comments

I think the llm utility[0] (the one from Simon, not Google) is probably the best quickstart experience you can find. Gives the option to connect to services via API or install/run local models.

As simple as

  pip install llm
  # add the local plugin
  llm install llm-gpt4all
  # Download and run a prompt against the Orca Mini 7B model
  llm -m orca-mini-3b-gguf2-q4_0 'What is the capital of France?'
Alternatively, you could use the llamafile[1] which is a tiny binary runner which gets packaged ontop of the multigigabyte models. Download the llamafile and you can launch it through your terminal or a web browser.

From the llamafile page, after you download the file, you can just launch it as

  ./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 9999 --temp 0.7 -p '[INST]Write a story about llamas[/INST]'
[0] https://llm.datasette.io/en/stable/index.html

[1] https://github.com/Mozilla-Ocho/llamafile

Edit: added llm quickstart from the intro page

Reddit community r/LocalLlama has great info
https://docs.sillytavern.app/usage/local-llm-guide/how-to-us...

Follow the guide all the way until you get to "Loading our model in Oobabooga". Then ignore the rest. You can do inference in Ooba under the Notebook tab.

(You can also ignore the "enabling HTTP API" parts, but it's quite handy, it's an OpenAI-compatible API which means you can use any OpenAI-compatible web UI)

The other answers are recommending paths which give you #1. less control and #2. projects with smaller eco-systems.

If you want a truly general purpose front-end for LLMs, the only good solution right now is oobabooga: https://github.com/oobabooga/text-generation-webui

All other alternatives have only small fractions of the features that oobabooga supports. All other alternatives only support a fraction of the LLM backends that oobabooga supports, etc.

If you don't care about the details of how those model servers work, then something that abstracts out the whole process like LM Studio or Ollama is all you need.

However, if you want to get into the weeds of how this actually works, I recommend you look up model quantization and some libraries like ggml[1] that actually do that for you.

[1] https://github.com/ggerganov/ggml

You can try going get some pre-trained (sometimes, fine-tuned) models on HuggingFace, following their instructions. Good luck!
Bit "Draw the rest of the owl" there.
it's all pretty well put together nowadays honestly.

here's a dead simple way : (1) download LM Studio, install it[0] (2) download a model from within the client when prompted (3) have a ball.

the program is fairly intuitive, it takes care of finding the relevant files, and it can even accept addendum prompts and various ways to flavor or specialize answers.

Learn the basics there, take what you learn to a more 'industrial' playground later on.

[0]: https://lmstudio.ai/

I use lmstudio and continue.dev. Deepseek model usually but i try out other models every now and then.
you can use Ollama and download many models. Performance depends on your laptop's capacity.