| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kxbnb 187 days ago

Nice execution on the replay testing with semantic diff - that's a pain point that's hard to solve with just metrics.

One thing I've noticed building toran.sh (HTTP-level observability for agents): there's a gap between "what the agent decided to do" (your trace level) and "what actually went over the wire" (raw requests/responses). Especially with retries, timeouts, and provider failovers - the trace might show success but the HTTP layer tells a different story.

Do you capture the underlying HTTP calls, or is it primarily at the SDK/trace level? Asking because debugging often ends up needing both views.

1 comments

Evanson 186 days ago

Thanks, and great point. Right now, Lumina is mainly SDK/trace-level (what the app thinks happened: tokens, cost, latency, outputs), so you’re right that low-level HTTP details like retries/timeouts/failovers can be partially hidden. Capturing the raw HTTP layer alongside traces is on our roadmap because production debugging often needs both views. Also, your “see what your agent is actually doing” angle is spot-on. There’s a lot of opaque magic in agent frameworks. Curious how you’re doing it in toran.sh proxy/intercept, or wrapping the SDK HTTP client?