| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mosselman 926 days ago
	Which models do you run and how?

3 comments

verdverm 926 days ago

https://github.com/ggerganov/llama.cpp is a popular local first approach. LLaMa is a good place to start, though I typically use a model from Vertex AI via API

link

mosselman 925 days ago

Thanks. I have llama.cpp locally. How do you use it in scripts? As in how do you specifically, not how would one.

link

d3nj4l 924 days ago

I have ollama's server running, and I interact with it via the REST API. My preferred model right now is Intel's neural chat, but I'm going to experiment with a few more over the holidays.

link

verdverm 923 days ago

I tried ollama today and it is super easy, finding good models is definitely going to be challenging. I tried a few on some sample (JSON) tasks and it is... frustrating... how they ellide or are unable to follow instructions.

Is there a good fine-tuning workflow with ollama?

link

d3nj4l 919 days ago

I haven’t tried any fine tuning so I can’t help there, sorry. Though I will say that neural chat has been pretty good to me, even though I have definitely observed it ignoring instructions at times, like a “here’s your json:” preamble in response to a query that specifically requests only json.

link

d3nj4l 924 days ago

I use ollama (https://ollama.ai/) which supports most of the big new local models you might've heard of: llama2, mistral vicuna etc. Since I have 16GB of RAM, I stick to the 7b models.

link

_joel 926 days ago

not op, but this is handy https://lmstudio.ai/

link

mosselman 925 days ago

Thanks!

link