| Llamactl is a unified management system for running local LLMs across llama.cpp, MLX, and vLLM backends, with a web dashboard and OpenAI-compatible API. I originally built this because I got tired of constantly SSHing to my server to edit a config just try out a new model. It's grown a lot since then. What it does: Web UI for creating and managing LLM instances from your browser Full llama.cpp model lifecycle - download from HuggingFace, create preset.ini configs with an in-browser editor, load/unload models via router mode Automatic idle timeout, LRU eviction, and instance limits llama.cpp, mlx_lm and vllm backends OpenAI and Anthropic API compatible endpoints (backend-dependent) Multi-node support for distributing instances across hosts Inference API keys with per-instance access control docs: https://llamactl.org/stable/ |