I figured I'd tell you a little bit more about the project: In a previous life I was the lead AI researcher in a healthcare startup, and whilst my team and I loved the challenge (this was pre GPT craze), it was super frustrating that everytime we showed a prototype, it would take ages, if at all, to bring the model to the product, so it can be actually useful to the user.
My personal struggles were with access to hardware (GPUs, I'm looking at you) but also about the fragility of the entire process of putting LLMs into production; the industry is flourishing, and the toolsets, though awesome, are evolving rapidly. This meant what's supported today, won't be tomorrow. And the cost of switching to new libraries was too high.
Kalavai is my solution (literally, a solution made for me) to all of these. Use any hardware to build up an LLM pool, and get out of the box templates to plug and play components of the LLM stack without affecting anything else. Yes, it supports the usual model engines (llama.cpp, vLLM and Petals for now) and you can swap them out without affecting the API layer.
I'd love to see if this is useful to the community. We are targeting those that are struggling like I was. We just want to tinker with new models, not figuring out how to install CUDA on a VM to make pyTorch work.
My personal struggles were with access to hardware (GPUs, I'm looking at you) but also about the fragility of the entire process of putting LLMs into production; the industry is flourishing, and the toolsets, though awesome, are evolving rapidly. This meant what's supported today, won't be tomorrow. And the cost of switching to new libraries was too high.
Kalavai is my solution (literally, a solution made for me) to all of these. Use any hardware to build up an LLM pool, and get out of the box templates to plug and play components of the LLM stack without affecting anything else. Yes, it supports the usual model engines (llama.cpp, vLLM and Petals for now) and you can swap them out without affecting the API layer.
I'd love to see if this is useful to the community. We are targeting those that are struggling like I was. We just want to tinker with new models, not figuring out how to install CUDA on a VM to make pyTorch work.