| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by driese 57 days ago
	Nice one! Let's say I'm serving local models via vllm (because ollama comes with huge performance hits), how would I implement that in gomodel?

2 comments

santiago-pl 56 days ago

I've released a new version of GoModel (0.1.20) with explicit support for vllm. You can now use it even with a few vLLM instances. Like this:

  docker run --rm -p 8080:8080 \
    -e VLLM_BASE_URL=http://host.docker.internal:18000/v1 \
    -e VLLM_BASEMENT_BASE_URL=http://host.docker.internal:18000/v1 \
    enterpilot/gomodel:latest

link

devmor 57 days ago

This is way more interesting to me as well. I have projects that use small limited-purpose language models that run on local network servers and something like this project would be a lot simpler than manually configuring API clients for each model in each project.

link

santiago-pl 57 days ago

Thanks for raising it! Since vLLM has an OpenAI-compatible API, this should work for now:

  docker run --rm -p 8080:8080 \
    -e OPENAI_API_KEY="some-vllm-key-if-needed" \
    -e OPENAI_BASE_URL="http://host.docker.internal:11434/v1" \
    ...
    enterpilot/gomodel

I'll add a more convenient way to configure it in the coming days.

link