| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jianfgo 849 days ago
	Anyone has a tutorial how to achieve it to own a self-hosted model?

8 comments

fbdab103 849 days ago

I think the llm utility[0] (the one from Simon, not Google) is probably the best quickstart experience you can find. Gives the option to connect to services via API or install/run local models.

As simple as

  pip install llm
  # add the local plugin
  llm install llm-gpt4all
  # Download and run a prompt against the Orca Mini 7B model
  llm -m orca-mini-3b-gguf2-q4_0 'What is the capital of France?'

Alternatively, you could use the llamafile[1] which is a tiny binary runner which gets packaged ontop of the multigigabyte models. Download the llamafile and you can launch it through your terminal or a web browser.

From the llamafile page, after you download the file, you can just launch it as

  ./mistral-7b-instruct-v0.2.Q5_K_M.llamafile -ngl 9999 --temp 0.7 -p '[INST]Write a story about llamas[/INST]'

[0] https://llm.datasette.io/en/stable/index.html

[1] https://github.com/Mozilla-Ocho/llamafile

Edit: added llm quickstart from the intro page

link

jassyr 849 days ago

Reddit community r/LocalLlama has great info

link

gettodachoppa 848 days ago

https://docs.sillytavern.app/usage/local-llm-guide/how-to-us...

Follow the guide all the way until you get to "Loading our model in Oobabooga". Then ignore the rest. You can do inference in Ooba under the Notebook tab.

(You can also ignore the "enabling HTTP API" parts, but it's quite handy, it's an OpenAI-compatible API which means you can use any OpenAI-compatible web UI)

link

Der_Einzige 849 days ago

The other answers are recommending paths which give you #1. less control and #2. projects with smaller eco-systems.

If you want a truly general purpose front-end for LLMs, the only good solution right now is oobabooga: https://github.com/oobabooga/text-generation-webui

All other alternatives have only small fractions of the features that oobabooga supports. All other alternatives only support a fraction of the LLM backends that oobabooga supports, etc.

link

manca 849 days ago

If you don't care about the details of how those model servers work, then something that abstracts out the whole process like LM Studio or Ollama is all you need.

However, if you want to get into the weeds of how this actually works, I recommend you look up model quantization and some libraries like ggml[1] that actually do that for you.

[1] https://github.com/ggerganov/ggml

link

tuanm 849 days ago

You can try going get some pre-trained (sometimes, fine-tuned) models on HuggingFace, following their instructions. Good luck!

link

fbdab103 849 days ago

Bit "Draw the rest of the owl" there.

link

serf 849 days ago

it's all pretty well put together nowadays honestly.

here's a dead simple way : (1) download LM Studio, install it[0] (2) download a model from within the client when prompted (3) have a ball.

the program is fairly intuitive, it takes care of finding the relevant files, and it can even accept addendum prompts and various ways to flavor or specialize answers.

Learn the basics there, take what you learn to a more 'industrial' playground later on.

[0]: https://lmstudio.ai/

link

cosmosgenius 849 days ago

I use lmstudio and continue.dev. Deepseek model usually but i try out other models every now and then.

link

alwinaugustin 849 days ago

you can use Ollama and download many models. Performance depends on your laptop's capacity.

link