What’s the most cost effective option for hosting an llm these days? I don’t need to train, I just want to use one of the llama models for inference to reduce my reliance on 3rd parties.
If you don't need a big model and are fine with hosting locally, an RTX3060 with 12GB VRAM is going to do just fine. Can be bought for about 200-300 USD.
I've been pleasantly surprised by what such a mediocre GPU and Llama3 8B can do for certain (simple) use cases. Ollama makes it all pretty easy.
Depends on the specific model and your perf requirements, but lots of them will run on a single box with a middle of the road GPU. If your invocation rate is low, hosted solutions like AWS Bedrock or using hosted APIs might be cheaper.
Consider also an online llama as a service like deepinfra. I have a local 3090 for playing around with the smaller models, but it's nice having the option of calling the 405b.
Ooh, I like that. Can see using them as a stepping stone where I'm using an open source model but without the hassle of setting up my own machine (but that I could later).
I've been pleasantly surprised by what such a mediocre GPU and Llama3 8B can do for certain (simple) use cases. Ollama makes it all pretty easy.