Hacker News new | ask | show | jobs
by menaerus 553 days ago
How do you run multiple queries from multiple clients simultaneously on the same HW without affecting each other context?
1 comments

It depends on the framework. Here's a LlamaSharp example: https://github.com/SciSharp/LLamaSharp/blob/master/LLama.Exa...
My question wasn't about how to run multiple queries against the LLM but rather how is it even possible from transformer architecture PoV to have a single LLM hosting multiple and different end clients. I'm probably missing something but can't figure that out yet.
If you have a branchless program, you can execute the same step of the program on multiple different inputs. https://en.wikipedia.org/wiki/SIMD