|
|
|
|
|
by niyikiza
50 days ago
|
|
>>harnesses should have more assertive layers of control and constraint Been saying this for a while and mostly getting blank stares. In-context "controls" as the primary safety mechanism is going to be a bitter lesson for our industry.
What you want is a deterministic check outside the model's reasoning that decides allow/deny without consulting its opinion. Cryptographic if the record needs to survive a compromised orchestrator, and open source. If your control is a string the model can read, the model can ignore it. If it can write it, it can forge it. I'm surprised how strange that idea sounds to some people. Disclosure: I'm working on an open source authorization tool for agents. |
|
I think a lot of people using the models genuinely feel like the models are more capable than they are now, and they're content to relinquish a lot of trust and agency. The worrying thing is that the models are superficially hyper-capable, but from more granular perspectives, you can see a lot of holes in their abilities. This is incredibly important, but very difficult to convey concisely to people. It's a classic example of nuance seeming too complicated because not caring is so much more gratifying. People love using these models.