| HN Mirror

Here's what I did, roughly, YMMV. I have a lot of experience in managing scale and web platforms as it's what I did for like a decade so, grain of salt and all. Let's assume you have access to N number of linux machines with GPU's, for the sake of argument.

I have a small RPI acting as my homelab pihole and dns so what better than to run the management UI on?! So I wrote a small bun management plane, nothing fancy, just a react app with user auth + openidconnect for those that like that stuff. From there, you have compute pool (empty at the moment because it requires a deployed agent). I added the ability to directly ssh into a machine, install the "agent" with privilege so it can manage docker, and the agent talks back to the management plane over websockets. A keep alive / health / status / resource packet every 15 seconds. Streams if you are looking at logs or accessing a container. I used Codex for most of this work but defined the protocol and everything upfront using protobuf (even though it's websockets). It helped with the "vision" and keeping the agent like Codex on the rails through completion.

Once you have a pool (agents installed on your N number of linux machines), you can deploy apps (which are my way of saying, a container with a namespace) or you can deploy agents (which is my agent, custom made for this) that are assigned to a project. I decided Org structures are a great way to delegate workloads so that's how they are modeled. Projects provide the git repo, the docker registry for images and storage of artifacts, as well as the history of all the prompts the agents have done in the project. Useful if you want to go back and search through |thinking| tags to figure out the reasoning behind a decision.

All of this was built in like maybe a month with Codex initially, until my agent was up to the task of coding w/ an endpoint configured (OpenAI API initially, now, NVidia DGX Sparks). What really works well is the delegation. The agent's have a webui that is exposed via the project urls so you can interact with the "scrum masters" of each project. They also share a stream if they are on the same project (but different subprojects).

I too wish there was more information on this but I didn't keep the lack of it from stopping me experimenting and finding what works. I came from the Mesos/DCOS era where you stop thinking about the metal and think in pools of resources. It's a distributed systems problem.