| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by menaerus 600 days ago
	How do you run multiple queries from multiple clients simultaneously on the same HW without affecting each other context?

1 comments

lostmsu 600 days ago

It depends on the framework. Here's a LlamaSharp example: https://github.com/SciSharp/LLamaSharp/blob/master/LLama.Exa...

link

menaerus 600 days ago

My question wasn't about how to run multiple queries against the LLM but rather how is it even possible from transformer architecture PoV to have a single LLM hosting multiple and different end clients. I'm probably missing something but can't figure that out yet.

link

lostmsu 600 days ago

If you have a branchless program, you can execute the same step of the program on multiple different inputs. https://en.wikipedia.org/wiki/SIMD

link