|
|
|
|
|
by raffisk
145 days ago
|
|
Introed Determinism-Faithfulness assurance harness (DFAH) in new paper "Replayable Financial Agents" along with the open-source code A few findings:
- Determinism and faithfulness are positively correlated (r = 0.45) for the tasks in my experiments
- Schema-first Tier 1 (7–20B) stays near the 95% compliance threshold under stress.
- Frontier models performed well on some tasks (e.g., strong action determinism in agentic triage), but the matrix helps define when HITL is still needed. note: I didn't have control of inferencing engines, or infra for these experiments, leveraged local models/frontier APIs Paper: https://arxiv.org/abs/2601.15322 |
|