| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by loughnane 639 days ago
	What’s the most cost effective option for hosting an llm these days? I don’t need to train, I just want to use one of the llama models for inference to reduce my reliance on 3rd parties.

5 comments

weweersdfsd 639 days ago

If you don't need a big model and are fine with hosting locally, an RTX3060 with 12GB VRAM is going to do just fine. Can be bought for about 200-300 USD.

I've been pleasantly surprised by what such a mediocre GPU and Llama3 8B can do for certain (simple) use cases. Ollama makes it all pretty easy.

link

PaulRobinson 639 days ago

Depends on the specific model and your perf requirements, but lots of them will run on a single box with a middle of the road GPU. If your invocation rate is low, hosted solutions like AWS Bedrock or using hosted APIs might be cheaper.

link

exe34 639 days ago

Consider also an online llama as a service like deepinfra. I have a local 3090 for playing around with the smaller models, but it's nice having the option of calling the 405b.

link

loughnane 638 days ago

Ooh, I like that. Can see using them as a stepping stone where I'm using an open source model but without the hassle of setting up my own machine (but that I could later).

link

l5870uoo9y 639 days ago

Have you tried how far you will get with a Hetzner VPS with dedicated CPU(s)?

link

rglullis 639 days ago

locally? I purchased a XTX 7900 GPU for ~900€ and I'm using ollama to run it and I've been trying different models.

link