Hacker News new | ask | show | jobs
by beebaween 480 days ago
What's the best way to run this is I prefer to use local GPUs?
2 comments

We’re adding this as we speak. Ollama support is already there, and here’s vLLM inference: https://github.com/vlm-run/vlmrun-hub/pull/120
You can try out some of our schemas with Ollama if you want: https://github.com/vlm-run/vlmrun-hub (instructions in Readme)