A unified efficient open-source LLM deployment engine for both cloud server and local use cases.
It comes with full OpenAI-compatible API that runs directly with Python, iOS, Android, browsers. Supporting deploying latest large language models such as Qwen2, Phi3, and more.
The MLCEngine presents an approach to universal LLM deployment, glad to know it works for both local servers and cloud devices with competitive performance. Looking forward to exploring it further!
Any ideas on how those edge and cloud models collaborate on compound tasks (e.g. the compound ai systems: https://bair.berkeley.edu/blog/2024/02/18/compound-ai-system...)