| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wsxiaoys 522 days ago
	> So using 2 NVLinked GPU's with inference is not supported? To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example

1 comments

SOLAR_FIELDS 521 days ago

I see. So this is like, I can have tabby be my LLM server with this limitation or I can just turn that feature off and point tabby at my self hosted LLM as any other OpenAI compatible endpoint?

link

wsxiaoys 521 days ago

Yes - however, the FIM model requires careful configuration to properly set the prompt template.

link