|
|
|
|
|
by jayphen
71 days ago
|
|
AI models on their own are raw, undirected, and inherently probabilistic. A "harness" acts as a control layer wrapped around the model, designed to steer it toward deterministic outcomes. It achieves this by equipping the model with actionable tools like web search or file I/O, and by orchestrating an evaluation loop that runs until an acceptable result is produced. (various analogies work here - an astronaut and a space suit, a rocket and the launch pad/mission control, okay I'm out of analogies that aren't car engines) You can see this in practice by looking at the leaked Claude Code source code. It is a harness around Anthropic's model built for writing code. It relies on heavily engineered (and sometimes brittle) steering mechanisms. These range from highly specific situational prompts to deterministic, hard-coded logic that executes based on the model's output. Getting a harness right is incredibly hard and feels like whack-a-mole at times. |
|
Looks me off an on three years to realize I was doing it backwards. Agent was originally born after I re-wrote CloneTool, a more generic Disk Cloning too with an SMAppService Launch Daemon.
After I completed CloneTool, I was like mmmmm what is I connected an LLM to the Daemon? It rattled of 50 things it could do and it had no knowledge of this anywhere in the harness, system prompt or tools. It simply had figured out its environment on its own.
I never ran Agent under that scenario it definitely has a hardness now. And yes getting the hardness right is a number one challenge and once you do get it working good with most LLMs out of the box, you try not to change it because that sweet spot is hard to come by. Not to say it never gets tweaked but the further in you go, the more you chringe on a change that may break it.