|
|
|
|
|
by kxbnb
140 days ago
|
|
Nice execution on the replay testing with semantic diff - that's a pain point that's hard to solve with just metrics. One thing I've noticed building toran.sh (HTTP-level observability for agents): there's a gap between "what the agent decided to do" (your trace level) and "what actually went over the wire" (raw requests/responses). Especially with retries, timeouts, and provider failovers - the trace might show success but the HTTP layer tells a different story. Do you capture the underlying HTTP calls, or is it primarily at the SDK/trace level? Asking because debugging often ends up needing both views. |
|